lsm2232 genes, genomes & biomedical implications...lsm2232 genes, genomes & biomedical...
TRANSCRIPT
LSM2232 Genes, Genomes & Biomedical Implications
Page | 1
Lecture 1/2/3 (Low BC Part 1)
Humans have 23 chromosomes and the chromosome number is determined by their size
from the largest to the smallest. The gender determining chromosome is 23 with XX for
females and XY for males.
Chromosome Painting is a term used to describe the direct visualisation using in situ
hybridisation of specific chromosomes in metaphase spreads and in interphase nuclei.
Chromosome painting, coupled with fluorescence in situ hybridisation (FISH) is now
used routinely to enhance the identification of chromosomal rearrangements, the
assignment of breakpoints and the determination of the origin of extra chromosomal
material.
In humans, 99.9% of our 25,000 genes are identical. Of the genes, 50% are repetitive
sequences. The human genome has about 3.2 109 nucleotide pairs.
Progress in molecular genetics has evolved significantly since 1950s – from cells & central
dogma to cytogenetics, genome landscapes, gene regulation and genome editing today.
Gene silencing is an example of gene regulation where RNAi/siRNA is used to cause cleavage
of targeted mRNA molecules which supresses gene expression.
Alternatively morpholino oligos can be used by binding to complementary
sequences of RNA/ssDNA via base paring. Morpholinos act by “steric blocking”,
binding to a target sequence within RNA, inhibiting molecules which can interact
with the RNA.
Genome editing can be done via CRISPR-Cas9 system.
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) are bacterial
loci containing short direct repeats of 24-48bp. It is a form of acquired prokaryotic
immune system which confers resistance to exogenous sequences such as plasmid
and phages.
Cas9 enzyme acts as a pair of molecular scissors that can cut the two strands of DNA
at a specific location in the genome so that bits of DNA can then be added or
removed.
A guide RNA (gRNA) about 20 bases long is a small piece of pre-designed RNA
sequence located within a longer RNA scaffold. The scaffold part binds to DNA and
the pre-designed sequences ‘guides’ Cas9 to the right part of the genome. The gRNA
has bases that are complementary to that of the target DNA sequence in the genome.
The Cas9 follows the guide RNA to the same location and makes a cut. When the cell
detects the damaged DNA, the repair machinery can be used to introduce mutations.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 2
Animal models such as mice or rats are used to help understand various biological
processes as they are physiologically alike with humans and share 80-85% of the gene
sequence identity with humans – similar syntenic (genes occurring on the same
chromosome) groups.
Conserved chromosomal domains may be important for chromosomal function
Introns are nucleotide sequences in a gene which are noncoding and are removed by RNA
splicing during maturation of the final RNA product.
Introns are integral to gene expression regulation. Some introns themselves encode
functional RNAs through further processing after splicing to generate noncoding
RNA molecules.
Some introns play essential roles in a wide range of gene expression regulatory
functions such as non-sense mediated decay and mRNA export.
Alternative splicing of introns within a gene acts to introduce greater variability of
protein sequences translated from a single gene, allowing multiple related proteins
to be generated from a single gene and a single precursor mRNA transcript.
Comparison between human and Fugu genes:
The Fugu has a compact genome with only 15% of repetitive DNA (vs. 50% in
humans) and the average intron length is 1/6 that of human.
The larger size of the human introns is due to the presence of retrotransposons
(LINEs/SINEs).
Close to 2000 proteins between human and Fugu are 70% similar, suggesting that
genes are highly conserved.
When sequences are highly conserved, it is likely that the function of the proteins is similar,
but the small difference in the sequence can always cause a difference in the function.
Introns and intergenic regions can produce miRNA that suppresses gene expression
CpG island/CG sites are key signature motifs in DNA that indicates a higher likelihood of
gene clusters. There is more variability of GC content and CpG density in humans than in
mouse.
Lecture 4 (Low BC Part 2)
A cell has two sets of each chromosome, one
coming from the mother and the other from the
father. The maternal and paternal chromosomes
in a homologous pair have the same genes at the
same loci, but possibly at different alleles.
Karyotyping (visualizing the number and
appearance) can be done using dyes
(chromosome painting).
Giemsa stain (mix of methylene blue and
eosin) binds to gene-poor A-T rich
regions after chromosome digestion with
trypsin and yields a series of lightly (GC)
and darkly (AT) stained bands.
Fluorescent dyes can be used to
simultaneously visualise all pairs of
chromosomes in different colours.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 3
Chromosomes can undergo recombination (crossing over) and translocation during meiosis
to allow for genome diversity.
For translocation, the two chromosomes need to be co-localized (in close physical
proximity) in order for it to happen.
A disease results when the gene gains a function or loses a function.
Gene insertion, deletion and duplication are other mechanisms that may cause a
rearrangement in the chromosome.
The human chromosome 22 (one of the smallest) contains 4.8 million nucleotide pairs and
makes up approximately 1.5% of the human genome.
10% of a chromosome arm contains about 40 genes while one gene would contain
about 34000 nucleotide pairs.
Genes can be located in either strand of the DNA. The top strand and bottom strand can code
for different genes, but the coding sequence of one gene will always be on one strand.
In the gene below, the top strand has fewer genes as compared to the bottom strand
and a possible reason for this is that the top strand has fewer promoter sequences
for the initiation of gene transcription.
Closely related species may have different number of chromosomes but can result in the
same expression due to having a similar number of genes.
The advantage for fewer chromosomes is that cell division would be easier as less
organization is required (microtubules)
However the disadvantage is that a single mutation event would result in a stronger
impact on the mutation (many eggs in one basket scenario).
A genome can expand due to gene duplication (polyploidy in plants) and this allows the
species to adapt to harsher conditions.
For the human genome:
Largest gene = 2.4 106 base pairs
Average gene size = 27000 base pairs
Average exon size = 145 base pairs
Average cDNA length = 1000 base pairs typical protein is about 300+ amino acids.
Average exons per gene = 8.8
LSM2232 Genes, Genomes & Biomedical Implications
Page | 4
What’s in the human genome?
We are about 50% repeats while protein-coding regions are only about 1.5% of the
entire genome.
Open Reading Frame (ORF): any sequence of the DNA within the genome that possibly can
encode for a functional element.
Example of both strands of DNA having coding sequences:
ORF1: 5’– GGC CTT ACG TTA TTA CCC –3’
ORF1: 3’– CCG GAA TGC AAT AAT GGG –5’
ORF2: 5’– GG CCT TAC GTT ATT ACC C –3’ Stop codon encountered
ORF2: 3’– CC GGA ATG CAA TAA TGG G –5’
ORF3: 5’– G GCC TTA CGT TAT TAC CC –3’
ORF3: 3’– C CGG AAT GCA ATA ATG GG –5’
Stop codons are: UAA, UAG, UGA and thus the probability of hitting stop is 3/64.
For viral genomes which are compact, both strands on the same loci can code for
different proteins based on the reading frame.
Lecture 5/6 (Low BC Part 3)
DNA organization and packing happens during DNA expression and DNA replication.
The DNA is most tightly packed during cell division.
The origin of replication initiate replication bubbles.
In bacteria and yeast, the origin of replication has been identified and show
sequence-specific activation.
In mammalian cells, the sequences are highly variable.
Origins of replication are clustered in groups of 20-80 called replication units
irregularly.
30-300kb intervals separate individual origins within each replication unit.
Replication units activate during S phase.
DNA-binding proteins (histones) package DNA into a compact and less fragile form called
chromatin which is DNA complexed with proteins.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 5
Eukaryotic chromosomes are linear with several origin of replication and has 4 levels of
chromosome organization:
1. Primary structure – nucleosomal packets of 11nm (beads on a string), composed of
double stranded DNA (146bp) wrapped around an octamer of histone proteins (2 of
each H2A, H2B, H3 and H4)
Histone proteins are basic and thus positively charged. Amino acids such as
lysine and arginine can form H-bonds with the phosphates along the DNA
backbone.
All four histones share a structural motif known as the histone fold formed
from 3 alpha helices connected by two loops.
The histone fold first bind to each other to form H3-H4 and H2A-H2B dimers,
then the H3-H4 dimers combine to form tetramer before further combining
with two H2A-H2B dimers to form an octamer.
Genes that code for these histone proteins are paralogs where there is a
conserved domain.
2. Secondary structure – organization of nucleosomes to form 30nm fiber (active
euchromatin)
A single histone H1 molecule binds to each nucleosome, contacting both the
DNA and protein The H1 histones package the nucleosomes into even tighter
arrays by guiding DNA entry and exit from complex and by neutralizing DNA
charge.
Histone tails are largely unstructured and are thought to be involved in the
interactions between nucleosomes that help to pack them together.
Tails on the histones can be modified (methyl, acetyl, phosphate, ubiquitin)
for specific purposes.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 6
3. Tertiary structure – radial loop/solenoids (300nm) formed from the interaction
between 30nm fiber and nuclear matrix.
Euchromatin is transcriptionally inactive when in this form.
4. Global structure – higher order of packing in interphase chromosomes. Poorly
understood but there are rare visible examples:
Lampbrush – chromosomes in interphase
cells. Most of the genes present in the DNA
loops are being actively expressed while the
majority of the DNA is remains highly
condensed on the chromosome axis and are
not expressed.
Polytene cells come from the fruit fly
Drosophila and have increased numbers of
standard chromosomes. They are found in
the salivary glands of fly larvae where the
cells undergo multiple cycles of
DNA synthesis with cell division.
Multiple copies of the genes are
held side-by-side. When viewed
under a microscope, distinct
alternating dark bands and light
interbands are visible. About 95%
of the DNA is in bands and 5% is
in interbands. The chromatin in
each band appears dark because
the DNA is more condensed than
the DNA in interbands. Gene expression is likely to be more active in the
interband.
Chromatin Domains:
Heterochromatin (700nm) – highly condensed chromatin which normally does not
harbour genes and is transcriptionally inactive (~10% of DNA).
Mostly in centromeres and telomeres.
Provides protection against “parasitic” mobile elements.
Active Euchromatin (30nm) – least condensed, transcriptionally active chromatin
(~10% of DNA)
Inactive Euchromatin (300nm) – intermediate compaction form, transcriptionally
inactive.
An interphase chromosome below is shown folded into a series of looped domains each
containing about 50,000 to 200,000 or more nucleotide pairs of double-helical DNA
condensed into a chromatin fiber.
The chromatin in each individual loop is further condensed through poorly
understood folding processes that are reversed when the cell requires direct access
to the DNA packaged in the loop.
In mitotic chromosomes, the bases of the chromosomal loops are enriched both in
condensins (binds to chromosomes and compact the radial loops) and DNA
topoisomerase II (preventing DNA tangling) which form the axis at metaphase.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 7
Through chromosome breakage and re-joining, a piece of chromosome that is normally
euchromatic can be translocated into the neighbourhood of heterochromatin and this can
cause silencing/inactivation of normally active genes – this is known as position effect.
Heterochromatin packaging can also retract to allow expression of newly released genes.
Position effect variegation is the diversifying of phenotype in generically identical
cells and is dependent on a gene’s neighbouring heterochromatin status.
The white gene in the
Drosophila controls eye pigment
production. Wild-type have normal
pigment production which gives them red
eyes, but if the White gene is mutated and
inactivated, the mutant has white eyes.
In flies in which a normal
White gene has been moved near a region
of heterochromatin, the eyes have both
red and white patches as the gene has
been silenced by the heterochromatin.
The centromere contains heterochromatin
consisting of short, repeated DNA sequences
known as alpha satellite DNA which are AT rich.
The repeats contain slight sequence variations
and are flanked by heterochromatin made of
non-satellite repeats.
Regular replication machinery cannot fully elongate the end of a linear chromosome and
thus telomerase is required to extend the end of a chromosome such that no crucial gene
sequences are lost during replication.
Telomerase recognizes the tip of an existing telomere DNA repeat sequence and
elongates it in the 5’ to 3’ direction, using an RNA template that is a component of
the enzyme itself to synthesize new copies of the repeat.
Telomeres end in t-loops in which the protruding end of the telomere loops back
and tucks itself into the duplex DNA of the telomere repeat sequence.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 8
In heterochromatin, the histone tails are under-acetylated and this allows silent information
regulator (Sir) proteins to bind to these histones.
In the yeast telomere model, the telomeric proteins attract the NAD+ dependent histone
deacetylase Sir2 which silences transcription at the telomeres.
Telomeres are important for protecting chromosomal content and only germ cells
(egg/sperm) have telomerase activity. Somatic cells experience telomere shortening from
DNA replication.
Genes related by DNA sequence likely arose from gene duplications/shuffling/mutations
and they are called gene families. The release from selective pressure allowed mutations to
accumulate and later gene product/function.
Homolog genes are genes related to a second gene by descent from a common
ancestral DNA sequence. The term homolog may apply to both orthologs and
paralogs.
Orthologs are genes in different species that evolved from a common ancestral gene
by speciation. Normally orthologs retain the same function in the course of
evolution. The identification of orthologs is critical for reliable prediction of gene
function in newly sequence genomes.
Paralogs are genes related by duplication within a genome. Orthologs retain the
same function in the course of evolution whereas paralogs evolve new functions
even if these are related to the original one.
E.g. Gene coding for myoglobin and haemoglobin are paralogs but the gene coding
for haemoglobin in humans and dogs are orthologous.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 9
Other genome elements in the human genome include:
Retroviral-like elements (retrotransposons)
E.g. promoter in long terminal repeat (LTR)
Encodes reverse transcriptase
DNA-only transposons which encodes transposase – enzyme that binds to the end of
a transposon and catalyses the movement of the transposon to another part of the
genome by a cut and paste mechanism or replicative transposition (copy and paste)
mechanism.
Duplications, simple repeats and gene regulatory elements.
Non-retroviral retrotransposons:
Long Interspersed Nuclear Elements
(LINEs) such as L1 (~1000-12000bp) and they encode for
endonucleases and reverse transcriptase.
Transposition of the L1 element begins
when an endonuclease attached to the L1 reverse transcriptase
and the L1 RNA nick the target DNA at the point at which the
insertion will occur. RNase H then removes the RNA. This
cleavage releases a 3’-OH DNA end in the target DNA which is
then used as a primer for the reverse transcription step. This
generates a single strand DNA copy of the element that is directly
linked to the target DNA. IN subsequent reactions, further
processing of the single-strand DNA copy results in the generation
of a new dsDNA copy of the L1 element that is inserted at the site
of the initial nick via DNA polymerase.
Short Interspersed Nuclear Elements
(SINEs) such as Alu (~300bp) which do not carry their own
endonuclease or reverse transcriptase gene.
The organization structure of the LINE/SINE
elements is as follow: +1: transcription start site; pol II/III: RNA
polymerase II and III promoters; R-EN: restriction-like
endonuclease; AP-EN:
apurinic/apyrimidinic
endonuclease; pA:
polyadenylation signal
lacking downstream
efficiency element; RT:
reverse transcriptase.
There are ~850,000 LINEs (21% of genome) and ~1,500,000 SINEs (13% of genome) but
most are non-functional.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 10
Due to reverse transcriptase/endonuclease binding to normal cellular mRNA, LINEs/SINEs
can be said to be retropseudogenes.
LINES are strongly biased towards AT rich regions while SINEs are strongly biased
towards GC rich regions.
Mitochondria in animal cells and plastids in plant cells are organelles that contain their own
genomes.
They encode genes for their own use but also import products produced by nuclear
genes. There are differences between their genetic code and that of nuclear DNA.
Mitochondrial DNA (mtDNA) is mostly circular but some are linear. Similar to
bacterial DNA, they do not have histones. In mammals, mtDNA is about 16.5kb and is
maternally inherited.
Compared to nuclear/chloroplast/bacterial genomes, mitochondrial genome has several
surprising features:
Dense gene packing: the mitochondrial genome seems to contain almost no
noncoding DNA: nearly every nucleotide seems to be part of a coding sequence,
either for a protein or for rRNA/tRNA.
Relaxed codon usage: only 22 tRNAs are required for mitochondrial protein
synthesis compared to 30+ in the cytosol and chloroplasts.
Variant genetic code: 4 of the 64 codons have different “meanings” from those of the
same codons in other genomes.
It is thought that eukaryotic cells
originated through a symbiotic relationship
between an archaeon and an aerobic bacterium
where the archaeon provided the nucleus and the
bacterium serving as a respiring ATP-producing
endosymbiont which eventually evolved into the
mitochondrion.
Mitochondria DNA has a higher rate of
mutations due to the generation of free radicals
due to oxidative reactions and minimal DNA
repair system.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 11
Lecture 7 - Prokaryotic Transcription
Enzymes that perform transcription are called RNA polymerases. RNA polymerases
catalyse the formation of the phosphodiester bonds that link the nucleotides together to
form a linear chain.
RNA polymerase moves stepwise along the DNA, unwinding the DNA helix just
ahead of the active site for polymerization to expose a new region of the template
strand for complementary base pairing.
Thus transcription is 5’ to 3’ on a template that is 3’ to 5’.
The coding strand is the DNA strand that has the same sequence as the mRNA and is
related by the genetic code to the protein sequence that it represents.
The transcription unit is the sequence between sites of initiation and termination by RNA
polymerase and it may include more than one gene. The elements are:
Promoter – region of DNA where RNA polymerase binds tightly to initiate
transcription.
Terminator – sequence of DNA that causes RNA polymerase to terminate
transcription. For most bacterial genes, a termination signal consists of a string of A-
T nucleotide pairs preceded by a twofold symmetric DNA sequence which when
transcribed into RNA, folds into a “hairpin” structure. The formation of the hairpin
helps to disengage the RNA transcript from the active site.
Startpoint (+1) – position on DNA corresponding to the first base incorporated into
RNA.
Upon binding to the promoter, the RNA polymerase opens up the double helix to expose a
short stretch of nucleotides (~10) on each strand in a transient transcription bubble (~12
to 14bp) and uses the template strand (3’ to 5’) to synthesize a complementary sequence of
RNA running 5’ to 3’ (~8 to 9bp within bubble).
As transcription bubble progresses, DNA duplex reforms and displaces the RNA in a
form of a single polynucleotide chain
Transcription rate is about 40 to 50 nucleotides per second; DNA replication rate is
about 800 base pairs per second.
A nascent RNA is an RNA chain that is still being synthesized such that its 3’ end is
paired with DNA where the RNA polymerase is elongating.
In bacteria, all RNA molecules are synthesized by a single type of RNA polymerase and thus
this applies to the production of mRNA as well as structural and catalytic RNAs.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 12
Steps for the transcription reaction:
1. RNA polymerase binds to the promoter on the DNA to form a closed complex.
2. RNA polymerase initiates transcription (initiation) after opening the DNA duplex to
form a transcription bubble (open complex).
3. During elongation, the transcription bubble moves along DNA and the RNA chain is
extended in the 5’ to 3’ direction.
4. Transcription stops when it encounters a terminator sequence.
5. DNA duplex reforms and RNA polymerase dissociate to release newly synthesized
RNA.
The bacterial RNA polymerase consists of the core enzyme (~5000kDa) comprising of five
subunits, α2ββ’ω and a sigma (σ) factor. The association of the core enzyme and sigma
factor is referred to as RNA polymerase holoenzyme.
The two α subunits serves as a scaffold for assembly of the holoenzyme and binding
to DNA, interacts with promoter and some regulatory factors through its C-terminal
domain (CTD)
The β subunits catalyse the covalent linkages between adjacent ribonucleotides and
make up most of the enzyme mass.
The sigma (σ) factor changes the DNA-binding properties of RNA polymerase so
that its affinity for general DNA is reduced and its affinity for promoters is
increased.
The sigma factor is involved in only the initiation step.
The initiation complex contacts from
the -55 to +20 regions. When
initiation succeeds, the initial RNA
synthesis (abortive initiation) is
relatively inefficient as short,
unproductive transcripts are often
released.
However once the nascent RNA
chains reaches 8-9 bases in length,
the sigma factor is released and the
RNA polymerase transit to elongation
ternary complex of core RP-DNA-
nascent RNA.
Upon dissociation of the sigma factor
(-30 region), the core enzyme
contracts and the polymerase
tightens around the DNA, shifting to
the elongation mode of RNA
synthesis when the RNA chain
extends to 15-20 bases.
The sigma factor and the core
enzyme recycle at different points in
transcription.
Promoter clearance time (1-2 seconds) is how long it takes the current polymerase to
leave the promoter so that another promoter can initiate.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 13
The sigma factor changes its structure to expose its
DNA-binding regions when it associates with the core
enzyme.
The N-terminus of sigma blocks the DNA binding
regions from binding to DNA.
The sigma factor binds to both the -35 and -10
sequences which are the interactions with the
promoter.
Consensus sequence: an idealized sequence in which each position represents the base
most found when many actual sequences are compared.
The promoter consensus sequences consist of a purine (A/G) at the startpoint (+1),
the hexamer TATAAT centred at -10 and another hexamer TTGACA centred at -35.
This consensus sequence is derived from alignment of >300 E.coli promoter
regions.
Individual promoters usually differ from the consensus at one or more positions,
and promoters are asymmetrical.
Between the two promoter elements (-35 and -10), the spacing between (15 to 19
bp) is critical for its function.
The promoter efficiencies can be increased or decreased by mutation:
Mutations in the -35 sequence can affect initial binding of the RNA polymerase.
Mutations in the -10 sequence usually affect the melting reaction that converts a
closed to an open complex.
Mutations at the initial transcribed region (+1 to +20) influences the rate at which
the RNA polymerase clears the promoter.
E. coli has 7 sigma factors, each of which causes RNA polymerase to initiate at a set of
promoters defined by specific -35 and -10 sequences.
Other sigma factors are activated by special conditions and they recognize
promoters with different consensus sequences.
Substitution of sigma factors may control initiation:
70 is used for general transcription.
A cascade of sigma factors is created when one sigma factor is required to transcribe
the gene coding for the next sigma factor
Substitution of sigma factor causes enzyme to recognize a different set of promoters
with different consensus sequences.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 14
The termination of transcription may require the recognition of both the terminator
sequence in DNA and the formation of a hairpin structure in the RNA product.
The terminator sequence is located before the point at which the last base is added
to the RNA.
Antitermination causes the enzyme to continue transcription past the terminator
sequence – this event is called readthrough.
Intrinsic termination: termination at certain sites in the absence
of any other factors.
Intrinsic terminators consist of a GC rich hairpin in the
RNA product followed by a U-rich region in which the
termination occurs.
They also include palindromic regions that can form
hairpins varying in length from 7 to 20 base pairs.
The following sequence are the consensus sequences for E. coli for
the coding strand:
Rho-dependent termination:
The protein functions as a helicase,
binding at the rut (rho utilization site) site
(upstream from terminator) on the RNA
after the rut site is synthesized in the RNA.
At the terminator site, the DNA encodes
an RNA sequence containing several GC
base pairs that form a stem-loop
structure that binds to RNA
polymerase which results in a
conformational change that cause RNA
polymerase to pause.
The ρ protein is now able to catch up to
the stem-loop, pass through it and break
the hydrogen bonds between the DNA
and RNA within the open complex.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 15
Transcription and translation occur simultaneously in bacteria (coupled
transcription/translation) as ribosomes begin translating an mRNA before its synthesis has
been completed. In addition, the mRNA is also degraded simultaneously in bacteria.
Half-life of bacterial mRNA is only a few minutes and thus unstable.
In eukaryotic cells, synthesis and maturation of mRNA occurs in the nucleus. The
mRNA is then exported to the cytoplasm where it is translated by ribosome. A
typical eukaryotic RNA is relatively stable and can be continued to be translated for
several hours.
Untranslated regions include:
5’ UTR – sequence upstream from the coding region of mRNA
3’ UTR – sequence downstream from coding region of mRNA.
Bacterial mRNA may be polycistronic (have several coding regions that represent different
cistrons; code for different proteins)
Intercistronic distance may vary from -1 to +40 bases.
Termination is prevented when antitermination proteins act on RNA polymerase to read
through a specific terminator.
The location of the antiterminator site vary – can be in the promoter or within the
transcription unit.
The site where an antiterminator protein acts is upstream of the terminator site in
the transcription unit.
Phage lambda has two antitermination proteins, pN and pQ which act on different
transcription units.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 16
DNase footprinting can be used to study DNA-protein interaction:
A bound DNA-binding protein blocks the phosphodiester bonds from attack by
nuclease or chemicals, thus revealing the protein precise recognition site as a
protected zone/footprint.
Electrophoresis Mobility Shift Assay (EMSA) is used to study
protein-DNA or protein-RNA interactions.
A mobility shift assay is an electrophoretic separation of a
protein-DNA/RNA mixture on a gel. The speed at which
different molecules move is determined by their size and
charge.
The control lane (DNA probe without protein present) will
contain a single band corresponding to unbound DNA/RNA.
The larger the bound protein, the greater the retardation of
the DNA molecule.
Example: After purification of the ENO1 promoter binding proteins, the authors carried out an
electrophoretic mobility shift assay. Based on the description in the Figure 1 legend, predict the
EMSA result on panel B.
A schematic diagram showing the experimental strategy devised for the purification of ENO1 promoter binding proteins. B) Electrophoretic mobility shift assays (EMSA) showing DNA-protein complexes using the biotinylated DNA sequence corresponding to the ENO1 promoter and total nuclear extract of tachyzoites. Lane 1, unbound biotinylated probe alone. Lane 2, gel shift binding assays revealing the biotinylated DNA-protein complexes. Lane 3, specific competitor corresponding to unlabeled ENO1 promoter introduced simultaneously with labelled probe during binding assays.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 17
Lecture 8 - Operon
Regulator gene is a gene that codes for a product that controls the expression of other genes.
An operon is a unit of bacterial gene expression and regulation which includes structural
genes and control elements in DNA recognized by regulatory gene product(s).
A trans-acting product can function on any copy of its target DNA and is likely to be a
diffusible protein while a cis-acting site affects the activity only of sequences on its own
molecule of DNA.
In negative control, a repressor protein binds to an operator to prevent a gene from being
expressed while for positive control a transcription factor (activator) is required to bind at
the promoter in order to enable RNA polymerase to initiate transcription.
In inducible regulation, the gene is regulated by the presence of an inducer (substrate)
while in repressible regulation the gene is regulated by a repressor which is usually the
product of its enzyme pathway.
E.g. the tryptophan operon consists of a single promoter and 5 genes which encode different
enzymes needed to synthesize tryptophan from simpler molecules. When tryptophan inside
a bacterium is low, RNA polymerase binds to the promoter and transcribes the 5 genes.
However if tryptophan concentration is high, it binds to the repressor protein (allosteric)
and it becomes active, blocking the binding of RNA polymerase to the promoter by binding
to the promoter cis-regulatory, repressible regulation and negative control.
The level of response for a system in the absence of a stimulus is its basal level – basal level
of transcription of a gene is the level that occurs in the absence of any specific activation.
The derepressed state describes a gene that is turned on because a small molecule
corepressor is absent while a super-repressed is a mutant condition in which a repressible
operon cannot be derepressed, so it is always turned off.
Genes coding for proteins that function in the same pathway may be located adjacent to one
another (organized into operons) and controlled as a single unit that is transcribed into a
polycistronic mRNA.
The lac Operon in E. coli is controlled by both the Lac repressor and the catabolite repressor
protein (CRP) which is an activator (CRP has to bind cAMP before it can bind to promoter)
When glucose is no longer available, the intracellular cAMP concentration increases
and thus CAP gets activated, activating the lac Operon and thus allowing the bacteria
to digest other sugars.
The lacI gene has its own promoter and terminator while the transcription of the lacZYA
operon is controlled by a repressor protein (lac repressor) that binds to an operator that
overlaps the promoter at the start of the cluster (PO).
LSM2232 Genes, Genomes & Biomedical Implications
Page | 18
The operator O occupies the first 26bp
of the transcription unit. The long lacZ
gene starts at base 39 and is followed
by the lacY and lacA genes and a
terminator.
The repressor protein which binds to
the operator is a tetramer of the
identical subunits coded by the lacI
gene.
The lac operon is negatively inducible
where β-galactoside, the substrate of
the lac operon is its inducer.
In the absence of β-galactosides,
the lac operon is expressed at a
very low (basal) level.
The addition of specific β-
galactosides induces
transcription of all three genes of
the lac operon.
As the lac mRNA is extremely
unstable, induction can be rapidly
reversed.
Transcription level increases
upon addition of inducer and thus
level of mRNA increases
exponentially. With the removal
of inducer, the mRNA would
quickly degrade but the level of β-
galactosidase remains high as proteins don’t degrade as fast as mRNA.
The lac repressor protein is a tetramer of the identical subunits coded by the lacI gene. It
has two binding sites – one for operator DNA and another for inducer.
The natural inducer is 1,6-allolactose (converted from lactose) which can be
metabolised and does not persist in the
cell.
A gratuitous inducer resembles
authentic inducers of transcription but
they are not substrates for the induced
enzyme and thus it cannot be
metabolised. E.g. isopropyl β-D-1-
thiogalactopyranoside (IPTG)
Lactose can be hydrolysed into
galactose and glucose by β-
galactosidase.
The inducer binds to the lac repressor
(allosteric) and converts it into a form
with lower operator affinity, thus allowing RNA polymerase to initiate transcription.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 19
Repressor can be divided into N-terminal DNA binding domain, hinge and protein
core.
DNA-binding domain contains two short -helical regions with helix-turn-
helix (HTH) motif that bind the major groove of DNA.
The inducer-binding site and regions responsible for multimerization are
located in the core.
HTH DNA binding proteins bind DNA as dimers with separation of 3.4nm
Mutations in the operator (Oc) cause constitutive expression of all three lac structural genes
because the repressor is unable to bind to the mutant operator, thus allowing RNA
polymerase to have unrestrained access to the promoter. As the operator can only control
the lac genes adjacent to it, these mutations are cis-acting as they only affect those genes on
the contiguous stretch of DNA. Oc can be said to be cis-dominant.
Mutations that inactivate the lacI gene (codes for repressor) cause the operon to be
constitutively expressed because the mutant repressor protein cannot bind to the
operator. The lacI- mutation is recessive as the indication of a normal lacI+ gene can restore
control even in the presence of a defective lacI- gene.
Mutations in the inducer-binding site of the repressor (lacIs – super suppressor) allow the
repressor to bind to the operator and prevent lac operon transcription uninducibility.
Mutations in the DNA-binding site of the repressor (lacld – dominant) are constitutive as
the repressor cannot bind to the operator.
This mutant gene makes a monomer that has a damaged DNA binding site. When it is
present in the same cell with the wild-type gene, multimeric repressors are
assembled at random from both types of subunits function can be interfered.
Only one subunit of the multimer needs to be of the lacld type to block the repressor
function and thus the mutation has a dominant negative behaviour.
The lacI promoter as an operator consisting of a palindromic
sequence of 26 base pairs (sequence that reads the same on each
strand when the strand is read in the 5’ to 3’ direction) consisting
of adjacent inverted repeats. Each inverted repeat of the operator
binds to the DNA-binding site of one repressor subunit.
The inducer binding causes a change in repressor
conformation that reduces its affinity for DNA and
releases it from the operator.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 20
Two symmetrical half sites in the regulatory sequences shows that it is bound by dimeric
regulatory proteins.
To determine the bases that contact the repressor (contact sites) and constitutive mutations,
chemical crosslinking or experiments can be performed to see whether modification
prevents binding.
Constitutive mutations occur at 8 positions in the operator between +5 and +17.
In order to examine the lac+ phenotype, the E. coli can be grown on a plate with nutrient
agar containing IPTG and X-gal while the control agar plate should only contain nutrients
and X-gal.
As β-galactosidase is produced in the lac+ phenotype, X-gal would be cleaved to
produce a blue reaction product.
A full repression of the lac operon would require the lac repressor to bind to O1 (highest
affinity) and either O2 or O3 operators.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 21
The second layer of control lies in catabolic repression (ability of glucose to prevent the
expression of a number of genes) where cAMP and CRP binds to a target sequence at a
promoter.
Secondary messenger cAMP converts CRP to a form that binds the promoter and
assists RNA polymerase in initiating transcription.
When glucose level is low, cAMP is produced which activates a dimer of CRP. The
CRP interacts with the C-terminal domain (CTD) of the subunit of RNA polymerase
to activate it.
The lac operon is under both positive and negative control.
1. In the presence of both
glucose and lactose, β-
galactosidase is not
needed and thus the lac
operon is off. The
presence of glucose
causes low levels of cAMP
and thus CRP doesn’t bind.
2. If glucose is the sole
carbon source, β-
galactosidase is not
needed and thus the lac
operon is off. Repressors
bind to the operator and
CAP fails to bind.
3. When glucose and lactose is absent, β-galactosidase is not needed. The operon is off
as the lac repressor bound to the operator prevents CRP from turning the lac operon
on.
4. If lactose is the sole carbon source, β-galactosidase is needed. CRP binds and turns
the lac operon on, producing β-galactosidase to breakdown lactose into glucose.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 22
In summary:
Regulatory proteins lacI Gene for lac repressor (lacR) which represses the lac operon
CAP/CRP Gene for catabolite activator protein which cAMP binds to and
activates the lac operon
Regulatory DNA sequences
lacC CAP/CRP binding site lacP Lac promoter which is CAP/CRP dependent lacO Lac operator which is lacR binding site
The genes of the lac
operon
lacZ Gene for β-galactosidase which stains colonies blue lacY Gene for lac permease which transport lactose lacA Gene for lac transacetylase
Mutations in the lac
operon Affected function -IPTG +IPTG
lacI- lacR mutant, inactive, recessive – defective
repressor + +
lacI-d lacR mutant, cannot bind DNA, dominant-negative –
defective repressor + +
lacIS lacR mutant, cannot bind to inducer, super
repressor, uninducible - -
CAP- CAP mutant, inactive – cAMP cannot bind - -
CAPC CAP mutant, constitutive, cAMP-independent – CAP
binds + +
lacC- CAP binding site mutant, cannot be bound by cAMP-
CAP complex. Results in defective CAP binding site or reduced CAP binding
- -/+
weak
lacP- Lac promoter mutant, inactive – defective
promoter - -
lacOC Lac operator mutant, cannot be bound by lacR, constitutive expression – defective operator
+ +
lacZ- β-galactosidase mutant, inactive - - Wild Type None - + lacO- lacIS Defective operator + super repressor + +
Apart from measuring the transcription level directly via quantitative PCR (qPCR), a
reporter assay or the activity of beta-galactosidase can be used to measure the activity of the
promoter.
1. In a reporter assay, a reporter gene encoding for an easy to measure protein such as
GFP or luciferase is added after the gene’s promoter and the amount of the reporter
protein can be easily measured.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 23
Lecture 9 – Phage Lambda
A virus consists of a nucleic acid genome contained in a protein coat. In order to reproduce,
the virus must infect a host cell. The typical pattern of an infection is to subvert the
functions of the host cell for the purpose of producing a large number of progeny viruses.
1. The lytic cycle is the infection
of a bacterium by a phage that
ends in the destruction of the
bacterium with the release of
progeny phase.
2. A prophage is a phage genome
covalently integrated as a
linear part of the bacterial
chromosome.
3. The ability of a phage to
survive in a bacterium as a
stable prophage component of
the bacterial genome is known
as lysogeny.
For virulent phages, they undergo the
lytic cycle only but for temperate
phages, they can choose between a
lytic and lysogenic pathway of
development.
Induction is the process when a
prophage is freed from the restrictions
of lysogeny, resulting in the destruction of the lysogenic repressor and the excision of free
phage DNA from the bacterial chromosome.
Immunity is the ability of a prophage to prevent another phage of the same type from
infecting another cell.
Lytic development is accomplished by a pathway in which the phage genes are expressed in
a particular order and this ensures that the right amount of each component is present at
the appropriate time. There are two parts to the cycle:
1. Early infection describes the period from entry of the DNA to the start of its
replication.
Early phase is devoted to the production of enzymes involved in the
reproduction of DNA.
2. Late infection defines the period from the start of replication to the final step of
lysing the bacterial cell to release progeny phase particles.
Protein components of the phage particle are synthesized such as the head,
tail and assembly proteins.
DNA replication reaches its maximum rate and it gets packaged into the
heads.
Lytic development is controlled by a regulatory cascade (sequence of events, each of which
is stimulated by the previous one)
Lytic cycle is under positive control so that each group of phage genes can be
expressed only when an appropriate signal is given.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 24
Early part of the first stage of gene expression
necessarily relies on the transcription apparatus
of the host cell and only a few genes are
expressed at this time.
In phage lambda they are called
immediate early genes.
One of these genes always code for a
protein – a gene regulator that is
necessary for transcription of the next
class of genes.
Next class of genes are known as the delayed
early or middle gene group. Its expression
typically starts as soon as the regulator protein
coded by the early gene is available.
If control is at transcription initiation,
then the two events are independent and
early genes can be switched off when
middle genes are transcribed.
If control is at transcription termination,
the early genes must continue to be
expressed.
Often the expression of host genes is
reduced the two sets of early genes
account for all necessary phage functions except those needed to assemble the
particle coat itself and those to lyse the cell.
When the replication of phage DNA begins, the late genes are expressed. This is arranged
by embedding an additional regulator gene within the previous set of genes. This regulator
may be another antitermination factor or another sigma factor.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 25
The means used to construct each phage
cascade are different, but the results are
similar.
Two mechanisms for recognizing new phage
promoters:
1. Replace sigma factor of host enzyme
with another factor that redirects its
specificity in initiation or synthesize a
new phage RNA polymerase
The critical feature that
distinguishes the new set of
genes is their possession of
different promoters from
those originally recognized by
host RNA polymerase.
2. Antitermination provides an
alternative mechanism for phages to
control the switch from early genes to
the next stage of expression.
The same promoters continue to be recognized by RNA polymerase but the
new genes are expressed only by extending the RNA chain to form molecules
that contain the early gene sequences at the 5’ end and the new gene
sequences at the 3’ end.
From the genetic point of view, the mechanisms of new initiation and antitermination are
similar where both are positive controls in which an early gene product must be made by
the phage in order to express the next set of genes.
By employing either sigma factor or antitermination proteins with different
specifications, a cascade for gene expression can be constructed.
Genes concerned with related functions are often clustered. In phage T7, the genome
consists of three classes of genes which codes three classes of genes that are expressed
sequentially:
Class I: RNA polymerase + enzymes that interfere with host gene expression.
Class II: enzymes for DNA synthesis and lysozyme
Class III: Head and tail proteins.
When lambda DNA enters a host, the lytic and lysogenic pathways start off the same where
expressions of the immediate early and delayed early genes are required.
Lytic development follows if the late genes are expressed
Lysogeny ensues if synthesis of a gene regulator called the lambda repressor is
established by turning on its gene – cI gene.
Lambda has two immediate early genes, N and cro which are transcribed by host RNA
polymerase.
N gene codes for an antitermination factor whose action at nut (N utilization sites)
allow transcription to proceed into delayed early genes.
The cro gene codes for a repressor that prevents expression of the cI gene which
codes for the lambda repressor depressing the late genes and turns off expression of
the immediate early genes.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 26
Three of the delayed early gene products are
regulators (cII, cIII and Q).
The cII-cIII pair of regulator genes is needed
to establish synthesis of lambda
repressor for lysogenic pathway.
The Q regulator gene codes for an
antitermination factor that allows host
RNA polymerase to transcribe the late
genes and is necessary for the lytic cycle.
The lytic cycle depends on antitermination by
pN which allows RNA polymerase to continue
transcription past the ends of the two immediate
early genes.
N is transcribed toward the left using PL
while cro is transcribed toward the right
using PR.
The synthesis of the N protein
(antiterminator pN) allows RNA
polymerase to pass the terminators tL1 to
the left and tR1 to the right into 7
recombination (left) and 2 replication genes
(right).
pQ is the product of a delayed early gene and is an antiterminator that allows RNA
polymerase initiating at PR to transcribe the late genes
Lambda DNA circularizes after infection and as a result the late genes form a single
transcription unit.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 27
Lysogeny is maintained by the Lambda Repressor
Protein encoded by the cI gene.
The cI gene has two promoters, PRM (promoter
right maintenance) and PRE (promoter right
establishment). Mutants in this gene cannot
maintain lysogeny and always enter the lytic
cycle.
The lambda repressor acts at the OL and OR operators to
block transcription of the immediate early genes (N
and cro).
At OL the lambda repressor prevents RNA
polymerase from initiating transcription at PL.
This stops the expression of gene N which
prevents the expression of pN and thus the lytic cycle is blocked.
The lambda repressor binding at OR also stimulates transcription of cI, its own gene
from PRM
As long as the level of lambda repressor is adequate, there is continued expression of the cI
gene and this result in OL and OR being occupied indefinitely lysogeny is stable and lytic
cascade is repressed.
Immunity in phages refers to the ability of a prophage to prevent another phage of the
same type from infecting a cell.
When a second lambda phage DNA enters a lysogenic cell, repressor protein
synthesized from the resident prophage genome will immediately bind to OL and OR
in the new genome, preventing the second phage from entering the lytic cycle
A lysogenic phage confers immunity to further infection by any other phage with the
same immunity region.
In the absence of repressor, RNA polymerase can bind to PL and PR which starts the
lytic cycle. It cannot initiate at PRM in the absence of the repressor.
Virulent mutations prevent the repressor from binding at OL and OR and thus
lysogeny is unable to be established.
Counting phages via Serial dilution method:
Starting with an unknown concentration, perform a serial dilution and spread each
concentration on a plate.
As phage grow and lyse the host, plaques are formed in the bacterial lawn. By
counting the number of plaques on the lawns, the original concentration can be
determined.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 28
With wild-type phages, plaques are turbid or cloudy as they contain cells that have
established lysogeny instead of being lysed.
Virulent mutants are unable to establish lysogeny and thus the plaques only contain
lysed cells clear plaques
The lambda repressor subunit is a polypeptide of 27kD with two distinct domains
connected by a connector of 40 subunits:
N-terminal domain from residues 1-92 provides the operator binding site but with a
lower affinity than the intact lambda repressor
C-terminal domain from residues 132-236 is responsible for dimerization and can
form oligomers
Binding to the operator requires the dimeric form so that two DNA-binding domains can
contact the operator simultaneously.
Induction of a lysogenic prophage into the lytic cycle is caused by cleavage of
repressor subunit in the connector region which reduces the affinity for the operator.
Induction can be caused by UV irradiation which leads to degradation of repressor.
Balance between lysogeny and lytic cycle depends on concentration of repressor
where intact repressor is present in a lysogenic cell at a concentration sufficient to
ensure that operators are occupied.
In lysogeny, monomers are in equilibrium with dimers which bind to DNA.
Induction causes cleavage of monomers and disturbs the equilibrium and thus
dimers will dissociate.
The lambda operator is a 17bp palindromic sequence with an axis of symmetry through the
central base pair. The sequence on each side of the central base pair is a half site. Each
individual N-terminal region contacts a half site.
The amino acid sequence of the recognition helix in the
helix-turn-helix motif makes contact with particular bases in
the operator sequence that it recognises.
Contacts between helix-2 and
helix-3 are maintained by
interactions between hydrophobic
amino acids
Helix-3 of each monomer lies in
the wide groove on the same face
of the DNA and helix-2 lies across
the groove.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 29
Each operator contains 3 repressor-binding sites and overlaps with the promoter at which
RNA polymerase binds.
Binding sites within each operator are separated by spacers of 3 to 7pm that are AT
rich.
The orientation of OL has been reversed from usual to facilitate comparison with OR. Site 1 lies closest to the start point for transcription in the promoter and sites 2 and 3 lie farther upstream. At each operator, site 1 has a greater affinity (tenfold) than the other sites and thus the
repressor always binds first to OL1 and OR1 first.
Repressor binding to one operator increases the affinity for binding a second
repressor dimer to the adjacent operator.
However when both sites 1 and 2 are occupied, this interaction does not extend
further to site 3 In lysogeny, both sites 1 and 2 are filled but not site 3.
When two lambda repressor dimers bind cooperatively, each of the subunits of one dimer
contacts a subunit in the other dimer through the C-terminal domain, forming a tetrameric
structure.
Cooperative binding allows the repressor to bind the OL2 and OR2 sites at lower
concentrations and this is important in a system which release of repression has
irreversible consequences.
In an operon coding for metabolic enzymes, failure to repress will merely
allow unnecessary synthesis of enzymes, but failure to repress lambda
prophage will lead to induction o phage and lysis of cell.
When two dimers are bound at OR1 and OR2, the DNA-binding region/N-terminus of the
dimer (helix 2) at OR2 contacts RNA polymerase and stabilizes its binding to PRM and
activates it.
Repressor binding at OL blocks transcription of gene N from PL while repressor
binding at OR blocks transcription of cro but also is required for transcription of cI
low levels of repressor can positively regulate its own synthesis as long as enough
repressor is available to fill OR2
LSM2232 Genes, Genomes & Biomedical Implications
Page | 30
Repressor dimers bound at OL1 and OL2 can interact with dimers bound at OR1 and OR2 to
form octamers
This interaction stabilizes repressor binding and thus making it possible for
repressor to occupy operators at lower concentrations.
The DNA between OL and OR sites (gene cI) forms a large loop which is held together
by the repressor octamer.
At lower concentrations, lambda repressor form octamer and active PRM in a positive
autogenous regulation. Increase in concentration allows binding to OR3 and OL3 and
turn off transcription in a negative autogenous regulation.
When a lambda DNA enters a new host cell, RNA polymerase cannot transcribe cI because
there is no repressor present to aid it binding at PRM. The absence of repressor leads to the
availability of PR and PL.
Thus the first event after lambda DNA infects a bacterium is when genes N and cro
are transcribed and then pN allows transcription to be extended further.
The delayed early gene products cII and cIII are necessary for RNA polymerase to
initiate transcription at the promoter PRE.
The product of cII acts directly at the promoter while the product of cIII protects cII
from degradation.
Transcription from PRE leads to synthesis of repressor and blocks cro synthesis
(promotes lysogeny)
Direct effect is that cI mRNA is translated into repressor protein
Indirect effect is that transcription proceeds through the cro gene in the “wrong”
direction where 5’ part of the RNA corresponds to an antisense transcript of cro
and hybridizes to authentic cro mRNA which inhibits its translation.
The PRE has atypical sequences at -10 and -35 and RNA polymerase binds the PRE promoter
only in the presence of cII
The PRE promoter has a poor fit with the consensus at -10 and lacks a consensus
sequence at -35 and thus is dependent on positive regulator cII.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 31
Lysogeny requires several events:
Presence of cII and cIII causes repressor synthesis to be established and trigger
inhibition of late gene transcription.
cIII protects cII which allows PRE to be used for transcription extending
through cI.
This causes the lambda repressor protein to be synthesized in high amounts
and it immediately binds to OL and OR
Establishment of repressor turns off immediate and delayed early gene expression
Transcription from PL and PR is inhibited and repressor binding turns off the
expression of all phage genes.
Synthesis of cII and cIII halts and decays, and PRE cannot be used and
synthesis of repressor stops.
Repressor turns on the maintenance circuit for its own synthesis by via expression
from PRM by making contact with RNA polymerase sigma factor.
Repressor continues to be synthesized until at high levels, occupancy of OR3
causes the synthesis to be turned off.
Lambda DNA is integrated into the bacterial genome at the final stage in establishing
lysogeny.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 32
The lytic cascade requires cro protein which directly prevents repressor maintenance via
PRM as well as turning off delayed early gene expression.
Cro is responsible for preventing the synthesis of the lambda repressor protein cI.
Cro binds to the same operators as lambda repressor but with different affinities
where the affinity of cro for OR3 is greater than its affinity for OR2 or OR1
When cro binds to OR3, it prevents RNA polymerase from binding to PRM and this
prevents the maintenance circuit for lysogeny from coming into play.
When cro binds to other operators at OR/OL, it prevents RNA polymerase from
expressing immediate early genes (including cro itself) and any use of PRE is
prevented, indirectly blocking repressor establishment.
The delayed early stage when both cro and repressor are being expressed is common to
both the lysogeny and lytic cycle.
The critical event is whether cII causes sufficient synthesis of repressor to overcome
the action of cro. If cII causes sufficient synthesis of repressor, lysogeny will result
because repressor occupies the operators. Otherwise cro occupies the operators,
resulting in lytic cycle.
In the early stages of the infection, cro is given a head start over the lambda
repressor and so it would seem that the lytic pathway is favoured. However, stability
of the cII protein in the infected cell is a primary determinant of the outcome.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 33
Lecture 10 – DNA Replication and Transfer
An origin usually initiates bidirectional replication where a replicated region appears as a
bubble within non-replicated DNA.
Replicon: unit of genome in which DNA is replicated. Each contains an origin for
initiation of replication
Origin: sequence of DNA at which replication is initiated.
At the origin, two replication forks are created that move in opposite directions.
They usually meet halfway around the circle but there are ter sites that cause
termination if the replication forks go too far.
In E. coli, the origin of replication, oriC is 245bp in length.
It contains 11 palindromic
repeats that are methylated on adenine on both
strands by Dam methylase
Replication generates hemi-methylated DNA (only one strand is methylated) which
cannot initiate replication only fully methylated origins can initiate replication.
There is a 13-minute delay before the
repeats in origin are re-methylated
(other sites <1.5 minutes)
In delaying re-replication, SeqA binds to hemi-methylated DNA and prevent origin
from being remethylated.
Initiation at oriC requires the sequential assembly of a large protein complex on the
membrane that requires six proteins:
DnaA: ATP-binding protein and licensing factor (factor necessary for replication;
inactivated/destroyed after one round of replication)
DnaB: ATP-hydrolysis dependent 5’ to 3’ helicase which provides the “engine” of
initiation after the origin has been opened.
DnaC: chaperone to repress the helicase activity of DnaB until it is needed.
HU: general DNA-binding protein which stimulates replication. Has the capacity to
bend DNA and is involved in building the structure that leads to formation of open
complex.
Gyrase: type II topoisomerase which binds to double helix ahead of replication fork
and relieve the strain placed on the double helix as it unravels.
SSB (Single-strand binding protein): stabilizes the single-stranded DNA as it is
formed and modulates the helicase activity. About ~60/fork.
For initiation to occur, the following events must happen:
The oriC must be fully methylated
Protein synthesize is required to synthesize the origin recognition protein
Membrane/cell wall synthesis
LSM2232 Genes, Genomes & Biomedical Implications
Page | 34
Sequence of initiation:
DnaA-ATP binds to short fully methylated repeated
sequences (13bp and 9bp repeats) and forms an
oligomeric complex that melts DNA at the A-T rich
region
Six DnaC monomers bind each hexamer of DnaB and
this pre-priming complex binds the origin.
DnaG (primase) is bound to the helicase complex which
releases DnaC, allowing DnaB helicase to become active
and creates the replication fork.
A primase synthesizes an RNA chain that provides the
priming end for DNA replication.
Priming is required to start DNA synthesis as all DNA polymerases cannot initiate synthesis
of a chain of DNA, but can only elongate a chain. Synthesis of the new strand can only start
from a pre-existing 3’–OH end known as a primer.
DNA polymerase adds nucleotides
to the 3’–OH end of the growing
chain such that the new chain
grows in the 5’3’ direction.
DNA polymerases control the
fidelity of replication where they
often have a 3’5’ exonuclease
activity that is used to excise incorectly paired bases.
Proofreading – a mechanism for correcting errors in DNA synthesis that
involves scrutiny of individual units after they have been added to the chain
Processivity – The tendency to remain in a single template rather than to
dissociate and re-associate.
Note: DNA polymerase I has 5’3’ exonuclease activty where the base is
hydrolyzed and expelled if incorrect.
Fidelity of replication is improved by proofreading by a factor of ~100 to ~1000.
Semi-discontinuous replication: the mode of replication in which one new strand is
synthesized continuously while the other is synthesized discontinuously.
For the leading strand (5’3’), DNA polymerase advances continuously, but for the
lagging strand it makes short fragments (Okazaki fragments, 1000 to 2000 bases)
that are subsequently joined together
All DNA polymerases require a 3’–OH priming end to initiate DNA synthesis
The priming end can be provided by an RNA primer, nick in DNA or a priming
protein.
The E. coli replicase DNA polymerase III Holoenzyme is a 900kD complex with a dimeric
structure where each monomeric unit consists of:
A catalytic core contains three subunits which include a catalytic subunit () and a
proofreading subunit () and a subunit which stimulates the exonuclease.
One catalytic core is associated with each template strand.
A dimerization clamp-loader complex which consists of:
Two copies of the dimerizing subunit () which links the two catalytic cores
together.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 35
A clamp loader which is a five-subunit
protein complex that is responsible for
loading the clamp onto DNA at the
replication fork by placing the processivity
subunits on DNA where they form a
circular β clamp around DNA.
A processivity clamp which is responsible for
holding catalytic cores onto their template strands.
Each clamp consists of a homodimers of β-
subunits, the β2 ring, which binds around
the DNA and ensures processivity.
The core on the leading strand is processive because its
clamp keeps it on the DNA.
The clamp associated with the core on the lagging
strand dissociates at the end of each Okazaki
fragment and reassembles for the next fragment
The helicase DnaB is responsible for interacting
with the primase DnaG to initiate each Okazaki fragment.
Each Okazaki fragment starts with a primer and stops before the next fragment where DNA
polymerase I (with 5’3’ exonuclease) removes the RNA primer and replaces it with DNA.
DNA ligase I makes the bond that connects the 3’ end of one Okazaki fragment to the 5’
beginning of the next fragment.
In Eukaryotic replication, separate DNA polymerases undertake initiation and elongation
where a replication fork has one complex of DNA polymerase /primase and two complexes
of DNA polymerase and/or .
DNA polymerase has the
ability to initiate a new strand
where it is used to initiate both
the leading and lagging strand.
DNA polymerase elongates the
leading strand and a second
DNA polymerase elongates
the lagging strand.
Conserved function of the replication
components extends to the clamp loader and
processivity clamp as well as other functions
of the replisome.
A replication fork stalls when it arrives at damaged
DNA. To avoid death, bacteria can undergo lesion
bypass or homologous recombination.
Lesion bypass: replication by an error-prone
DNA polymerase on a template that contains
a damaged base. E. coli DNA polymerase IV
and V can incorporate a non-complementary
base into the daughter strand. Requires
temporary replacement with DNA pol III.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 36
After the damage has been repaired, the replication fork must be restarted and this
may be accomplished by assembly of the primosome which reloads DnaB so that
helicase action can continue.
Semiconservative replication: replication accomplished by separation of the strands of a
parental duplex with each strand then acting as a template for synthesis of a complementary
strand. Double stranded DNA contains one parental and one daughter strand following
replication.
Conservative model Semi-conservative Dispersive model Gen 0 15-15 15-15 15-15
Gen 1 Two bands: 50:50
15-15, 14-14 One band:
15-14, 15-14 One band, each strand is
50% heavy and 50% light.
Gen 2 Two bands: 25:75
15-15,14-14,14-14,14-14 Two bands: 50:50
15-14,14-14,15-14,14,14 Two bands, each strand is 25% heavy and 75% light.
Gene transfer in prokaryotes can happen via:
Transformation (naked DNA) – either via CaCl2 + heat shock or electroporation
Conjugation (bacteria-mediated): process in which two bacteria come in contact
and transfer genetic material. The process and is mediated by the F plasmid
(Fertility factor).
A free F plasmid is a replicon that is maintained at the level of 1
plasmid/bacterial chromosome which
can be integrated into the bacterial
chromosome. The F factor is
transferred frequently.
F plasmid consists of tra genes which
encodes for transfer functions (pilus
synthesis and assembly, cell pairing etc.)
and are all located in an operon.
F+ cell and F- cell: results in 2 F+ cells, no change in genetic composition.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 37
When F integrates into a bacterial chromosome, it gives rise to different Hfr
strains through site-specific recombination.
When Hfr strain mates with a F- cell, they almost never acquire an F+
phenotype as only the first part of F is transferred:
Mating channel is fragile and easily broken by change in environment
Time needed for complete transfer > bacterial lifespan (100min)
Recipient bacteria may lack space for additional DNA.
Chromosome map can be determined via interrupted mating technique
which is used to map the order of bacterial genes based on their order of
transfer into recipient cell.
Genes nearest to oriT have the highest
frequency of being transferred
Genes transferred early are more
frequently represented in
recombinants
Complete E. coli genetic map is about
90mins (4600kb) and zero point is the
marker thr.
If oriT is pointing to the left, then the gene on its right will be the first
to enter the recipient cell and F factor is at the end of the genome.
An F’ is formed by improper excision of F from bacterial chromosome and
it can carry as much as 15% of E. coli genome and thus providing partial
diploidy when transferred into a recipient strain.
This homologous region can recombine with host chromosome.
Transduction (phage-mediated): Bacteriophage-mediated transfer of host DNA
from one bacterium to another and occurs as the result of reproductive cycle of
bacteriophage.
Lytic cycle: viral reproductive cycle that ends in lysis of bacteria virulent
phage.
Lysogeny: maintenance of viral genome (prophage) within the host cell
(integrated into bacteria chromosome) temperate phage.
Two types of transduction:
Generalized transduction: any part of bacterial genome can be
transferred and occurs during lytic cycle.
Randomly sized fragments are packed into phage and
homologous recombination may occur in recipient bacteria.
Specialized (restricted) transduction: transfer of only specific
portions of the bacterial genome; carried out only by temperate
phages that have integrated their DNA into the host chromosome at
a specific site in the chromosome.
Phage particles carry both phage DNA and flanking bacterial
DNA, but only bacterial DNA adjacent to the prophage
insertion site is packaged.
Occurs only when lysogeny is induced to go into lytic phase.
The integration and excision of phage involves site-specific
recombination between attP and attB.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 38
Specialized transducing phage that carry genes located on the
left side of the prophage ( dgal) are proficient for lysogeny
but deficient for lysis and thus they require a helper phage to
lyse a recipient cell. The “d” indicates that the phage is
defective for lytic growth.
Specialized transducing phage that carry genes located on the
right side of the prophage ( bio) are proficient for lysis but
deficient for lysogeny. These phages can infect a recipient cell
and generate a lysate but require a helper phage to form
lysogens in a recipient cell. They have all functions required
for lytic growth.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 39
Selecting for recombinants or transconjugants:
Prototrophs: wild type strain that has minimal requirement for nutrient
supplements
Auxotrophs: mutant strain that has lost its ability to synthesize a nutrient such as
amino acids or lactose
E.g. Bacteria initially growing in complete media, then subsequently grown in a)
minimal media, b) minimal media + Histidine and c) minimal media + Arginine
Those that survive in a) are wild-type prototrophic colonies, b) His-
auxotrophic colony and c) Arg- auxotrophic colony.
Recombinants are detected using selective and counter-selective techniques:
Counter-selection against parental strains using antibiotics such as
streptomycin/kanamycin
Selection of recombinants using antibiotics or ability to utilize a sugar
(lactose)
Best way to select recombinants: minimal medium + lactose + streptomycin
for a conjugation of donor strain with StrSLac+ and recipient strain StrRLac-
Lac- cannot utilize lactose and thus it cannot grow in the minimal media with
lactose as sole carbon source
StrS strain is sensitive to streptomycin antibiotics and cannot grow in
minimal media with streptomycin
Donor Strain (StrSLac+)
Recipient Strain (StrRLac-)
Recombinants (StrRLac+)
MM + lactose Grow X Grow MM + lactose + strep X X Grow
MM + strep X X X
LSM2232 Genes, Genomes & Biomedical Implications
Page | 40
Lecture 11 –DNA Recombination
Two types of genetic recombination in bacteria:
General Recombination requires long (>50bp) sequence homology and RecA-
dependent
Site-specific recombination requires very short (<5bp) sequence homology and
has special site recognition. It is RecA-independent but requires specialized proteins.
General Recombination:
Genetic exchange takes place between 2 pieces of homologous DNA sequences and
it may be intra or inter-molecular events.
Recombination may result in insertion, gene amplification, deletions or inversions.
At the site of crossover, there is a heteroduplex
DNA formation (hybrid DNA from the different
parental duplex molecules) during genetic
recombination and new recombinant DNA
molecules are produced.
Single-strand invasion model: recombination is initiated by a nick in one strand.
RecA first binds cooperatively to the invading strand and invades the
homologous duplex.
Once a triplet nucleotide match is found, RecA hydrolyzes ATP and the
strands exchange.
Repair DNA polymerases and DNA ligase completes the repair process.
A Holliday junction is an intermediate structure in homologous
recombination where the two duplexes of DNA are connected by the genetic
material exchanged between two of the four strands, one from each duplex.
Double-strand break model: initiated by a double strand break (DSB) by an
endonuclease cleaving one of the partner’s DNA duplexes.
1. The DSB is enlarged to a gap by 5’3’ exonuclease action to create
protruding single-stranded 3’ tails.
2. Single-stranded DNA are recognized by RecA protein which initiates
homology search in other chromosome
3. ATP-dependent strand exchange occurs followed
by DNA synthesis and ligation
4. Branch migration (ability of DNA strand partially
paired with its complement in a duplex to extend
its pairing by displacing the resident strand with
which it is homologous) of Holliday junctions
5. Resolution by strand cutting via DNA ligase
LSM2232 Genes, Genomes & Biomedical Implications
Page | 41
The resolution of a Holliday junction produces “splices” or “patches”
Splice recombinant DNA results from a
Holliday junction being resolved by
cutting the non-exchanged strands.
Both strands of DNA before the
exchange point come from one
chromosome; the DNA after the
exchange point comes from the
homologous chromosome.
Patch recombinant DNA results from a
Holliday junction being resolved by
cutting the exchange strands. The
duplex is largely unchanged, except for a
DNA sequence on one stand that came
from the homologous chromosome.
Other proteins which participate in general
homologous recombination include:
RuvA: 22kD protein which binds to
RuvB and Holliday junctions
RuvB: 37kD helicase that catalyses
branch migration
RuvC: 19kD nuclease which resolves
Holliday structures (resolvase)
The above 3 proteins form the Ruv complex which acts on recombinant junctions.
DNA ligase
RecBCD is a helicase-nuclease complex that initiates the repair of double-strand breaks.
There are about ~1000 chi (crossover hotspot instigator) sites (5’ – GCTGGTGG – 3’)
present in the E. coli chromosome
Nuclease activity on the stand with the 3’ end is suppressed upon reaching a chi
sequence while the other strand continues to be degraded, generating a 3’ terminal
single-stranded end.
Single-stranded DNA generated at chi sites are hotspots for general recombination.
General recombination and DNA repair mechanisms may result in gene conversion where
only small sections of DNA or part of a gene undergo gene conversion.
Gene conversion is
non-reciprocal
exchange.
Mismatched DNA in
a heteroduplex are
recognized are
removed by the DNA
repair enzymes and
replaced with a
copy of the
complementary
strand.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 42
Three types of Site-specific recombination:
Transposons – Three major classes:
1. DNA-only transposons – requires transposase, moves as DNA either by cut-
and-paste or replicative pathways. Have short inverted repeats at each end.
Predominately in bacteria and responsible for spread of antibiotics
resistance in bacterial strains.
Excised from one spot on a genome and inserted into another
Transposons would encode for a transposase which carries out the
DNA breakage and joining reactions needed for the element to move.
DNA-only transposons can be recognized in chromosomes by
“inverted repeat DNA sequences” present at their ends.
Cut-and-paste movement beings when transposase brings the two
inverted DNA sequences together, forming a DNA loop.
o Transposase function as a dimer with each monomer
recognizing the same specific DNA sequence at the end of the
transposon.
Insertion occurs at random sites through the creation of staggered
breaks in the target chromosome, catalysed by transposase.
Subsequently, staggered breaks are repaired by DNA polymerase and
ligase.
Insertion site is marked by a short direct repeat of the target DNA
sequence (clues in identifying transposon in genome sequence)
LSM2232 Genes, Genomes & Biomedical Implications
Page | 43
2. Retroviral-like retrotransposons – requires reverse transcriptase and
integrase (transposase), moves via an RNA intermediate produced by a
promoter in the LTR. Have directly repeated long terminal repeats (LTRs)
at each end
Once the reverse transcriptase has produced a double stranded DNA,
specific sequence near its two ends can be recognizes by a virus
encoded transposase (integrase) which then inserts the viral DNA
into the chromosome using a similar cut-and-paste DNA only
transposons.
3. Non-retroviral retrotransposons – requires
reverse transcriptase and endonuclease, move
via an RNA intermediate that is often produced
from a neighbouring promoter (endonuclease-
reverse transcriptase complex).
Have poly A at 3’ end of RNA transcript
and the 5’ end is often truncated.
Occurs as repetitive DNA sequences (L1
element or LINE element)
Transposition beings when an
endonuclease attached to the L1 reverse
transcriptase and the L1 RNA nick the
target DNA at insertion point. Cleavage
releases 3’–OH DNA which acts as
primer for reverse transcription.
Single-strand DNA copy of the element is
generated and further processing results
in generation of new double-strand DNA
which is inserted at site of initial nick.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 44
Phage integration and excision – Specialized transduction where circular phage
lambda DNA is converted to an integrated prophage by a reciprocal recombination
between attP and attB.
Cre-Loxp system: Cre is a bacteriophage P1 integrase which catalyses site-specific
recombination between loxP sites (34 bp short direct repeats)
This recombination also works in mammalian cells in vitro and in vivo
LSM2232 Genes, Genomes & Biomedical Implications
Page | 45
Lecture 12 –DNA Repair
DNA repair is a major defence against environmental damage to cells which minimizes cell
killing, mutations, replication errors, persistence of DNA damage and genomic instability.
Abnormalities in DNA repair have been implicated in cancer and aging.
DNA damage can be:
Spontaneous: Depurination, deamination
Mutagen-induced: Pyrimidine dimers, alkylation, substitution, deletions/insertions,
frameshift mutations, double-strand breaks
Point mutations:
Transitions – a purine (A or G)/pyrimidine (C or T) is replaced by other
purine/pyrimidine.
A replaced with G or the reverse
C replaced with T or the reverse
Transversions – a purine (A or G) is replaced by a pyrimidine or vice versa
A replaced by C or T
G replaced by C or T
C replaced by A or G
T replaced by A or G
Hydrolytic attack can cause depurination or deamination. If left uncorrected, such changes
could lead to deletion or substitution of base pairs during DNA replication.
Deamination of bases in DNA yields unnatural nucleotides which can be directly
recognized and removed by specific DNA glycosylases.
Deamination of C produces U which can be repaired by uracil DNA
glycosylase.
Nitrous acid (HNO2) oxidatively deaminates primary amines, producing
transition mutations: Adenine Hypoxanthine
When methylated C is accidentally converted to T by deamination, DNA
mispairing can occur. G:C base pair G:T base pair.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 46
Formation of a dimer between 2 pyrimidine bases is possible when cells are exposed to UV
irradiation. This occurs between two adjacent thymine or cytosine bases.
Alkylation of a base may change the normal base pairing, leading to mutation.
Nitrogen mustard can cross-link with DNA at N7 of guanine, resulting in
chromosome breakage.
DNA exposed to EMS and MNNG yields O6-ethylguanine and O6-methylguanine
residues respectively which can base pair with both C or T. G:C base pair T:A base
pair.
Insertion/Deletion mutations are generated by intercalating agents
Intercalating agents increases the distance between 2 consecutive base pairs.
Replication of such DNA generates deletion or insertion of one or more nucleotides
in the newly synthesized DNA which results in a frameshift mutation.
E.g. Ethidium Bromide which binds to DNA, used in gel electrophoresis.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 47
Ames test is commonly used to assess mutagenicity of compounds.
Types of repair pathways:
Direct repair: direct reversal of the
damage. Widespread in all except
placental mammals.
Excision repair: initiated by a
recognition enzyme that sees an
actual damaged base or a change in
the spatial patch of DNA.
Base excision repair: remove
the damage base and replace it
in DNA e.g. DNA uracil
glycosylase
Nucleotide excision repair:
remove a sequence that
includes the damaged base(s)
and a new stretch of DNA is
synthesized to replace.
Mismatch repair: scrutinize DNA for
apposed bases that do not pair
properly. Arises during DNA
replication and are corrected by
distinguishing between the new and
old strand.
Recombination-repair: a mode of filling a gap in one strand of duplex DNA by
retrieving a homologous single strand from another duplex.
Nonhomologous end joining: repairs DSB when no homologous strands are
available.
In bacteria: the following types of DNA repair systems are present:
Repair of DNA synthesis errors
1. Proofreading by DNA polymerase (3’5’) exonuclease – reduces errors
introduced during DNA synthesis by 1000-fold.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 48
2. Mismatch repair by mutSLH system. Depends on
the methylation of selected A residues in GATC
to distinguish between newly synthesized DNA
and template DNA.
The mutH endonuclease makes a nick on
5’ side of the unmethylated GATC, UvrD
(helicase) and exonuclease removes the
DNA strand.
The unmethylated DNA strand is
corrected by DNA polymerase III.
Strand-directed mismatch repair reduces
the error by a further 100-fold.
Repair of DNA modifications
3. Direct reversal of damage – photo-reactivation
repair
Removes pyrimidine dimers in a light-
dependent reaction
Occurs in bacteria but not in placental
mammals.
Involves a photo-reactivation enzyme (PRE)
photolyase.
Non-mutagenic repair system.
4. Excision repair by DNA glycosylase and Apurinic/apyrimidinic (AP)
endonuclease
Base excision repair (BER) – only removes the damaged base. DNA
glycosylase cleaves the glycosidic bond leaving the
apurinic/apyrimidinic site.
Other enzymes such as AP endonuclease, DNA polymerase I and
DNA ligase are involved.
Nucleotide excision repair (NER) – corrects pyrimidine dimers and other
DNA lesions in which the bases are displaced.
In E. coli, NER is an ATP-
dependent process involving
UvrA, UvrB, UvrC and UvrD
proteins. The Uvr system
operates in states in which
UvrAB recognizes damage,
UvrBC nicks the DNA and UvrD
unwinds the marked region.
Individuals with Xeroderma
Pigmentosum (XP) and
Cockayne syndrome (CS) are
unable to repair UV-induced
DNA lesions.
Repair of replication fork barriers
5. Translesion synthesis – when lesion is encountered during replication, DNA
Pol III is replaced by error-prone Translesion DNA polymerase, Pol IV or V.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 49
Translesion DNA polymerase extends DNA synthesis beyond thymine
dimer independent of base pairing and has no proofreading exonuclease
activity.
Translesion DNA synthesis is error prone and often has errors in its
sequence. This is invoked as a last resort as part of the SOS response
Repair of breaks in DNA
6. Repair of DSB by homologous recombination (HR) and non-homologous end
joining (NHEJ)
A DSB is generated when the replication fork encounters a single-
strand nick in the template DNA. DSBs can also be induced by ionizing
radiation, replication errors, oxidising agents and certain cellular
metabolites.
DSB repair by NHEJ is common in mammalian somatic cells and Ku is a
key protein in NHEJ.
NHEJ pathway can ligate blunt ends of duplex DNA and thus suffer
deletion.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 50
Differentiated cells contain
all the genetic instructions
necessary to direct the
formation of a complete
organism. (A) The nucleus of
a skin cell from adult frog
transplanted into an
enucleated egg can give rise
to an entire tadpole. (B) In
many plants, differentiated
cells retain the ability to de-
differentiate such that a
single cell can form a clone of
progeny cells that later give
rise to an entire plant. (C)
Calves produced from the
differentiated cell donor are
all clones of the donor and
thus are genetically identical.
Lecture 13 –Eukaryotic Gene Expression (Overview)
Gene expression in multi-cellular eukaryotes
Genome constancy differential gene expression different proteins different
cell types
There is physical evidence for genome constancy:
Number of chromosomes is constant among different types of cells
All human cells contains 22 pairs of autosomal chromosomes and one pair of
sex chromosomes
Amount of nuclear DNA is constant among different cells
No gene amplification and rearrangement in majority of cell types (exceptions –
immune cells)
Totipotency of nuclei of differentiated cells – differentiated nuclei retain a complete set of
genes for the whole organism.
John Gurdon’s work in 1958 – nuclear transplantation and induced stem cells.
Differential Expression:
Not all genes are expressed in any single type of cells and different sets of genes are
expressed in different types of cells
Expression of the same gene may be at different levels in different types of cells or
under different circumstances
Due to genes being transcribed with different efficiencies resulting in
different amount of proteins produced.
Cell differentiation:
Cells become different through the synthesis of different sets of mRNAs and proteins
which results in different morphology and physiological function
Each type of cells synthesize a few characteristic proteins at high abundance
Globin in red blood cells
Cell differentiation is usually stable and irreversible.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 51
Differential mRNA expression by DNA microarray analyses (heat map)
Each column
represents one sample. In
the image, multiple liver
samples are shown.
Each row
represents one gene where
in the image, thousands of
genes are shown.
Red and
green colours represent
levels of gene expression
with red for higher levels
and green for lower levels.
Housekeeping genes are expressed in all types of cells for basic cellular functions
E.g. Structural proteins (β-actin, histones and ribosomal proteins etc.) and metabolic
enzymes (glycogen synthase kinase etc.)
Tissue-specific genes gives the cell its specific phenotype
E.g. globin, crystalin, insulin
Except for housekeeping genes, most other genes are only expressed in certain cells.
Red = common
(Housekeeping genes)
Blue = specific
(Tissue-specific genes)
Even though
proteins are translated,
many of them require post-
translational modifications
for their proper functions.
Thus proteins have
different isoforms.
Figure shows the differences
in RNA levels for two human genes in
seven different tissues.
RNA sequencing was used to
obtain the data where RNA was
collected from human cell lines
grown in culture derived from the
indicated tissues. The sequence reads
were mapped across the human
genome by matching RNA sequences
to the DNA sequence of the genome.
Number of transcripts can be
counted for quantitative analyses of
RNA expression.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 52
Examples of specialized proteins in differentiated cells for specialized functions:
Cell type Differentiated cell product
Specialized function
Keratinocyte (skin cell) Keratin Protection against abrasion, desiccation
Erythrocyte (red blood cell) Hemoglobin Transport of oxygen Lens cell Crystallins Transmission of light
B lymphocyte Immunoglobulins Antibody synthesis T lymphocyte Cell surface antigens Destruction of foreign cells;
regulation of immune response Melanocyte Melanin Pigment production
Pancreatic islet cells Insulin Regulation of carbohydrate metabolism
Osteoblast (bone-forming cell)
Bone matrix Skeletal support
Myocyte (muscle cell) Muscle actin and myosin Contraction Hepatocyte (liver cell) Serum albumin;
numerous enzymes Production of serum proteins and
numerous enzymatic functions Neurons Neurotransmitters
(acetylcholine, epinephrine etc.)
Transmission of electrical impulses
Comparison between gene expression in eukaryotes and prokaryotes:
Similarity of central dogma - DNARNAProtein
Differences for eukaryotes:
Occurs in nucleus
Exons/introns present DNA
5’ capping, RNA splicing and polyadenylation for mRNA
Exportation out of nucleus
Post-translational modification of proteins
Multiple levels of gene expression regulation in eukaryotic cells:
Pre-transcriptional control
Chromatin structure (heterochromatin and euchromatin)
DNA methylation (widely used and is the major form of epigenetic regulation)
Methylation affects binding of transcription factors to promoter
DNA amplification (for a small number of genes under special conditions and
cancer cells)
E.g. Xenopus ribosomal RNA genes ~1500X during oocyte growth
Drosophila polytene chromosomes in salivary gland ~1000 copies.
DNA rearrangement (only found from specific sets of genes in specialized
immune cells - immunoglobins)
In an immunoglobin light chain gene, a randomly chosen V (~35)
gene segment is moved to lie precisely next to one of the J (~5) gene
segments Results in a total of 35 5 =175 potential light chains.
For Ig heavy chain, V (40), D (23) and J (6) which results in 5520
variable heavy chains 1.5 million different combinations.
Transcriptional control
Due to presence of DNA regulatory elements e.g. promoter,
enhancer/silencer, locus control region, insulator binding sites etc.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 53
Gene clusters are organised within a loop consisting of enhancer, promoter,
structural gene and terminator.
Post-transcriptional control
Translational control
Post-translational control
The kind of proteins and the concentration can be controlled at multiple levels
In the end, the structure and function of a cell are determined mainly by the functional
proteins
Lecture 14 –Promoters and cis-elements
In comparison with prokaryotes, eukaryotes have more genes and only a portion of them
are transcribed (euchromatin). In addition, the chromosomal DNA is highly packed
(heterochromatin) and those regions are usually inactive.
In the transcription initiation:
Eukaryotic RNA polymerases (RNAP) cannot initiate transcription on its own and
would require a large set of proteins known as general transcription factors (GTFs).
GTFs help to position the RNAP and interact with gene-specific TFs. In comparison,
prokaryotes only require a sigma factor.
Eukaryotic RNA polymerases must cope with DNA packaging in the chromatin.
Eukaryotic gene promoter and other regulatory sequences can work in a long
distance (>50kb)
Eukaryotes have three RNA polymerase systems
RNA polymerase I – transcribes 5.8S, 18S and 28S rRNA genes
RNA polymerase II – transcribes all protein-coding genes, snoRNA, miRNA, lncRNA
and most snRNA genes
RNA polymerase III – transcribes tRNA, 5S rRNA, snRNA and genes for other small
RNAs
The S values refer to the rate of sedimentation in an ultracentrifuge where the larger
the S value, the larger the rRNA.
Different types of RNA:
rRNA – ribosomal RNA
snoRNA – small nucleolar RNAs for processing rRNA
miRNA – microRNA for degradation of mRNAs
snRNA – small nuclear RNAs for mRNA processing
lncRNA – long non-coding RNA > 200nt which have various functions in
transcription and epigenetics.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 54
The transcription initiation complex by RNA polymerase II consists of:
Regulatory regions (promoter, cis-elements),
Pre-initiation complex (RNA polymerases and GTFs)
Specific transcription factors (for certain genes)
Co-activator and chromatin remodelling proteins
Promoter: binding site of a gene for basal transcriptional machinery for initiation of
transcription.
Cis-elements: binding sites of DNA for specific transcription factors or other regulatory
proteins which affect the rate of transcription.
4 transcription initiation sites for RNA polymerase II:
BRE – B recognition element,
in 22% of human genes and
is the binding site of TFIIB
TATA – present in large
portion of genes (~24% in
human), allows for correct
position of polymerase to
start transcription about 30
bases downstream from
TATA box.
INR – Initiator sequence, present in 46% of human genes and is the starting point
for transcription at nucleotide A. Initiations alone in some genes without the TATA
box are sufficient to initiate gene transcription.
DPE – downstream promoter element, in small number of genes (~12%), functions
to allow cooperative binding to TFFID.
Note that not all these elements are present in the same gene promoter and some
promoters may contain more than one such element.
Transcription rate can be altered by binding of proteins to specific sequences
Enhancers are binding sites for transcriptional activators that increase the rate of
transcription
Silencers are binding sites for transcriptional repressors that decrease the rate of
transcription and in some cases, prevent a region from being transcribed.
Both enhancers/silencers can be located near or far away (>50kb) from the
transcription unit (up or downstream) or in introns.
Each enhancer/silencer generally provides binding sites for several protein factors
Function of enhancer/silencer is generally orientation independent – binding
consensus sequence is frequently palindromic or symmetrical.
There are several promoter types and can be separated into major and minor promoters
Major promoters are type I (adult, tissue specific - TATA), type II (ubiquitous, broad
expression – no TATA) and type III (developmentally regulated, differentiation).
Type I has sharp transcription start site (TSS) while type II has broad TSS.
Type I has disordered nucleosomes while type II’s nucleosomes are ordered.
Type I has no CpG islands while type II has.
For type III, the TSS is broad but sharper than type II. It has large CpG islands
extending into the body of gene.
Minor promoter is TCT (pyrimidine) promoter for ribosomal protein & TF genes.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 55
Primer extension assay can be used to determine if TSS
is broad or sharp
Isolate RNA, hybridize complementary
oligonucleotide primer, extend to end of mRNA
with RTase and denature cDNA-mRNA hybrid.
Perform sequencing.
E.g. if mRNA transcript with U as first
nucleotide, then extended product should
contain an A for the first nucleotide.
Within the high level expression of genes in a loop, the
following are present:
Promoter – for expression level, tissue specificity,
temporal expression and inducibility. Promoter affects
the rate of initiation and rate of chain elongation.
Enhancer/silencer
Locus control region (LCR) – present in some gene
clusters and consists of multiple DNase hypersensitive
sites, LCRs are required for correct expression of whole
gene cluster.
LCR control the transcription of targeted gene in the locus by direct
interactions, forming looped structures. This is done by recruiting
chromatin-modifying, co-activators and transcription complexes.
The deletion of LCR causes condensation to heterochromatin.
Insulator binding sites – prevent enhancer effect to neighbour genes and provides
barrier against the spread of heterochromatin
Have specialized chromatin structures containing hypersensitive sites
(“naked regions that are easily accessible for DNase digestion, indicating
accessibility by other protein factors such as for gene transcription).
In transgenic studies, two insulators can protect the region between them
for faithful transgenic expression
Different insulators are bound by different factors and thus have different
mechanisms as barriers
Matrix attachment region (MAR) – defined as the DNA region attaching to nuclear
matrix which can be experimentally isolated.
MARs are A-T rich but do not have consensus sequence. They also contain
gene regulatory sequences and thus it is postulated that they may be
important in regulation of gene transcription within the chromosome loop.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 56
In transgenic studies (transfer of genes), MAR can be used to ensure
transgene expression to counteract “chromosome effect”.
Chromosomal effect refers that expression of a transgene may be
affected by the local chromosome structure or cis-elements around
the transgene integration site.
In identifying MARs, prepare nuclei and extract histones. After that, cleave
the DNA with restriction nucleases where MARs would be stuck on the
matrix, the DNA can then be extracted and analysed (in vivo). Alternatively
degrade all with DNase and add DNA where point isolated fragments can be
tested for their ability to bind to the matrix (in vitro).
Process of transcriptional initiation:
(A) RNA polymerase requires several transcription
factors. The promoter contains a DNA sequence called the
TATA box which is located 25 nucleotides away from the
initiation site.
(B) Through its subunit TBP (TATA-binding protein),
TFIID recognizes and binds the TATA box, which then
enables the adjacent binding of TFIIB
(C) The binding of TFIID produces a distortion in DNA
which helps to attract the other transcription factors
(D) Rest of the general transcription factors and RNA
polymerase assembles at the promoter
(E) TFIIH then uses energy from ATP hydrolysis to pry
open the DNA double helix at the transcription start point,
locally exposing the template strand. TFIIH also
phosphorylates RNA polymerase II at the C-terminal
domain (CTD), changing its conformation so that the
polymerase is released from the general factors and can
begin the elongation phase of transcription.
TFIID has a TBP and 11TAFs (TBP-associated
factors). TBP binding causes significant bending and
opening of DNA that serves as an important signal for other
binding proteins. TAF recognises promoter and initiator
elements and interacts with gene-specific regulatory
proteins.
TFIIB (1 subunit, 33kDa) binds BRE in promoter and
enables interaction between TFIID and RNAP II-TFFIF. It
aids in accurately positioning RNAP at the start site of
transcription.
TFIIF (2 subunits, Rap30 & Rap74) functions similar to sigma factor in prokaryotes,
guides specific binding of RNAPII to the complex assembly at the promoter. It may also
be involved in the elongation of nascent RNA.
TFIIE (2 subunits) functions to control TFIIH, enhances promoter melting and
stimulates transcription
TFIIH (9 subunits) is a release factor with a ATP-dependent helicase to melt promoter
and kinase to activate RNAPII by phosphorylating the CTD.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 57
RNA polymerase II has 12 subunits:
Three core units: RPB1, RPB2 and RPB3 in the ratio of 1:1:2. Homologous to the
prokaryotic b’, b and a subunit respectively. RPB1 (~200kDa) binds to DNA and has
CTD = (YSPTSPS)n with n=26 in yeast and n=52 in mouse; RPB2 (~100kDa) binds
nucleotides and RPB3 is ~50kDa.
CTD consists of multiple repeats which can be phosphorylated, is important
in transcription initiation, elongation and RNA processing.
Common subunits: RPB5, 6 and 8 are found in all three RNA polymerases.
Nonessential subunits: RPB4 and 9 – deletion mutants of these two function well at
normal temperature but fail to grow at either higher or lower temperature.
Other subunits: RPB7 and RPB11 – RPB7 is responsible for correct initiation of
transcription.
RNA polymerase I convert rDNA to rRNA which are synthesized in the nucleolus.
Many molecules of RNA
pol I simultaneously transcribing
each of the two adjacent genes
Nascent transcripts are
seen as fine threads.
These rRNAs contribute
to the formation of ribosomes.
rRNAs constitute ~80% of total RNA and it is estimated 10 million rRNA is
synthesized in each cell generation.
rRNA genes have a bipartite promoter consisting a core promoter and an upstream
promoter element and requires two factors: Upstream binding factor 1 (UBF) and
Selectivity factor 1 (SL1)
LSM2232 Genes, Genomes & Biomedical Implications
Page | 58
UBF: required for high frequency initiation, maintaining open chromatin structure
by prevent histone H1 binding and assembly into inactive chromatin, stimulates
promoter release of RNAPI and stimulates SL1.
SL1: has 4 subunits including TBP and a TFIIB homolog – primarily used for
recruitment of RNAPI. The
TBP is associated with RNAPI
but not for DNA binding.
Transcription units for
RNA polymerase I (14
subunits, 590kDa) have a core
promoter separated by ~70bp
from the upstream promoter
element. UBF binding to the
upstream promoter element
(UPE) increases the ability of
core-binding factor to bind to
the core promoter. The core-
binding factor (SL1) positions
RNA polymerase I at the start
point, ensuring proper
localization at the start point.
There are 4 types of eukaryotic rRNA, each present in one copy per ribosome.
Three of the four rRNAs (18S, 5.8S and 28S) are made by chemically modifying
and cleaving a single large precursor rRNA and the fourth (5S) is synthesized
from a separate cluster of genes by a RNA polymerase III and does not require
chemical modification.
Both cleavage and chemical modifications of rRNA precursors require small
nucleolar RNAs (snoRNAs) as guide RNAs.
Many snoRNAs are encoded in the introns of other genes, especially those
encoding ribosomal proteins. They are synthesized by RNA polymerase II and
processed from excised intron sequences.
RNA polymerase III uses downstream and upstream promoters
Internal promoters have short consensus sequences (box A/B or A/C) located
within the transcription unit (downstream of start site) and cause initiation to
occur at a fixed distance upstream – deletion of 5’ sequence upstream of or
including the start point has no effect.
Upstream promoters contain three short consensus sequences (Oct, PSE, TATA)
upstream of the start point that are bound by TFs.
TFIIIA and TFIIIC bind to the consensus sequences and enable TFIIIB to bind at
the start point
TFIIIB has TBP as one subunit and enables RNA polymerase III to bind.
Type 1 (box A/C) is for 5S rRNA, type 2 (box A/B) is for tRNAs and Type 3 (Oct,
PSE, TATA) is for snRNA.
For type 1 and type 2, the main difference is the requirement of TFIIIA where
type 1 requires TFIIIA to bind to boxA while in type 2, 2 molecules of TFIIIC
binds to both boxA and boxB
LSM2232 Genes, Genomes & Biomedical Implications
Page | 59
Common features of transcription initiation
by the three RNA polymerases:
GTFs bind at the promoter before
RNAP itself can bind and GTFs form
pre-initiation complex to direct the
binding of RNAP.
SL1 binds to UBF1 which is
bound to promoter
sequence before recruiting
RNAP1
TFIID & TAFs recognize a
promoter for RNAPII
TFIIIB binds adjacent to
TFIIIC to localize RNAPIII
Positioning of all three types of
polymerases requires TBP – which
is associated with other factors
(TAFs); TBP is the universal
positioning factor for all types of
promoters & their polymerases.
All three RNA polymerases are
large proteins (~500kDa) with ~12
subunits; three subunits are
common.
Lecture 15 –Transcription Factors
Basic Features of TFs
Bind to specific DNA sequence through DNA-binding domain (BD)
Interact directly or indirectly with the basal transcriptional machinery through
protein-protein interaction via activation domain (AD)
Often contain other functional domains.
Yeast two-hybrid system can be used to identify protein-protein interaction in
vivo in yeast and then to clone the gene encoding the interacting protein.
Functional yeast Gal4 TF has a BD and AD which activates transcription of
LacZ which gives a blue colony
A hybrid (BD + protein A or AD +
protein B) alone does not lead to
transcription.
If proteins A and B bind each
other to bring AD and BD
together, transcription is
activated as the AD is brought
into position to interact with the
GTFs at TATA
LSM2232 Genes, Genomes & Biomedical Implications
Page | 60
Classification and structures of TFs
There are 10 superclasses of TFs and they are classified based on their conserved
DNA binding domains.
The most common ones are the zinc-coordinating domain (zinc finger), helix-turn-
helix domain (homeodomain) and the basic domain.
The number of TFs expressed in each tissue varies but the ratios of TF to total
expressed genes in all tissues are about the same (~6%)
Each TF controls multiple genes and TFs are differentially expressed in different
tissues different tissues have different patterns of TF expression.
During development, differential expression provides for hierarchical gene
regulation.
Prominent DNA binding domains of TFs
The molecular recognition between DNA & protein occurs mostly at major grooves
as they have wider space and contains more molecular features.
Protein-DNA interactions can be by H-bonds, Ionic bonds or Hydrophobic
interactions
Zinc Finger – typical finger (C2H2) is ~23aa with 2 cysteines on the β sheet and 2
histidines on the helix to chelate a zinc atom although variations are present
Zinc finger domain is formed by the interaction of the Zn atom with an
helix and an antiparallel β sheet
Each finger recognizes three GC-rich nucleotides.
Multiple zinc fingers are present in each protein
Amino acid residues -1, 2, 3 and 6 on the helix are critical for recognition of
nucleotides
Zinc fingers can be artificially designed to recognize targeted sequences –
zinc finger nucleases (ZFN) for genome editing.
Homeodomain Proteins – homeodomain is a conserved 60 amino acid domain
found in many TFs and is particularly important in development.
Homeodomain folded into 3
helices where helices 2 and 3
are similar to the HTH motif
Bases in both major and minor
groove are contacted
N-terminal arm lies in
minor groove, helices 1
and 2 lie above the DNA
while helix 3 lies in the
major groove.
Recognition has a ATTA(TAAT) core and the surrounding bases determine
specificity
Residue 50 plays an important role to determine target specificity
where K50 GGATTA while Q50 CCATTA.
Some homeodomain proteins contain two DNA binding domains
POU domain (HTH motif) cooperates with the homeodomain to
increase binding specificity and affinity
Paired domain binds to target DNA independent of homeodomain.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 61
Basic Helix-loop-helix (bHLH) proteins:
basic domain (stretch of basic amino
acids) for DNA binding
Contains two helices of unequal
length with a long loop allowing
flexibility to the helices which fold
over each other.
Formation of dimer through the
HLH domain
E.g. Myogenic bHLH proteins –
MyoD is specifically expressed in muscle cells while E2A is generally
expressed in many cell types. Id (inhibitor of differentiation) has no basic
region.
Formation of a dimer with MyoD inhibits the formation of MyoD-E2A
dimer, in the presence of Id, the E box (CANNTG) is not occupied.
In a proliferating myoblast which expresses MyoD, E2A and Id, the
MyoD binding site in the promoter of muscle creatine kinase (MCK)
is not occupied. When induced to differentiate into muscle, Id
concentration decreases and MyoD-E2A forms and binds to the MCK
promoter which causes the MCK gene to be transcribed.
bHLH proteins are present in all eukaryotes from yeast to human and is
involved in cell differentiation of various other types including heart,
pancreas and skin.
Leucine zipper proteins (bZIP): leucine residues
are present in every seven amino acids and thus
located on one side of the helix.
Leucine zipper is an amphipathic helix
where one face contains side chains that
are hydrophilic and the other face contains
side chains that are hydrophobic (leucine)
This is a motif that has a dimerization
domain (leucine zipper) and a DNA binding
domain (basic region) and the protein only functions when the dimer is
formed.
E.g. Myc and Max forms heterodimers and bind to E box (CACGTG) of target
genes. Myc is an important transcription factor, regulating the transcription
of ~15% of cellular genes including many growth factor genes.
Myc has multiple functions in cell cycle progression, apoptosis and
stem cell renewal.
Myc is one of the four factors to induce pluripotent stem cells by
combination with other factors (OSKM factors or Yamanaka factors:
Oct4, Sox2, Klf4 and Myc)
Overexpression of Myc frequently cause cancer
Transcriptional activation domains (AD)
ADs either interact directly with GTFs or with cofactors (in protein-protein
interactions).
LSM2232 Genes, Genomes & Biomedical Implications
Page | 62
There are four kinds of protein domains that are commonly observed to be involved
in transcription activation:
Acidic domains – acidic amino acid side chains (Glu/Asp) e.g. Gal4, VP16
E.g. CREB (cAMP-response element binding protein) has an AD which
exists as unstructured
random coils with strong
negative charge (Asp rich).
In the presence of cAMP,
Ser-133 is phosphorylated
and CREB AD folds into
two amphipathic -helices
and interacts with co-
activator CBP, resulting in
the transcription of genes
whose control regions contain a CREB-binding site.
Glutamine-rich domains – about 25% Gln in sequence e.g. SP1
Proline-rich domains e.g. c-Jun, Ap2 and Oct2
Isoleucine-rich domains e.g. NTF-1
Nine-amino-acid transactivation domain (9aaTAD) – loose consensus in a large
superfamily of eukaryotic TFs and has been demonstrated to be essential for
transcription activation.
Multiple domains of TFs: nuclear receptors (steroid hormone receptors)
These are a group of zinc finger proteins that bind steroids in the cytoplasm and as a
result they move into the nucleus where they bind the DNA and dimerize to activate
transcription.
Ligand binding causes release of
inhibitory proteins while causing the
receptor to bind co-activator proteins
that stimulate transcription.
Nuclear receptor binds to HRE
(hormone responsive element) to
activate transcription by enhancing
formation of transcription initiation
complex
Assembly of multiprotein complex on
HRE enhances transcription by
interaction with GTFs, TAFs (TBP-
associated factors) and TIF (Transcription intermediary factor)
Each receptor has two fingers for DNA binding
and each finger contains 4 cysteine residues –
they can form dimers and bind to short
palindromic DNA sequences.
For the glucocorticoid receptor, the binding
site must contain a 3bp spacer for correct
positioning of the 2 zinc fingers to specifically
activate transcription.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 63
In summary, nuclear receptors have a DNA binding domain, activation domain
and a ligand binding domain.
Activity of nuclear receptor depends on expression of receptor in cell, availability
of ligand which acts as a “switch”, interaction with dimerization partner,
presence and accessibility of relevant responsive elements on the gene,
presence of relevant co-factors such as co-activators.
Signalling transduction: from extracellular signal to nuclear transcription
E.g. TGFβ (transforming growth factors) signalling.
Extracellular ligand TGFβ binding
resulted tetramerization of two
types of receptors to phosphorylate
intracellular domains
Activation of R-Smad (receptor
regulated Smad)
Formation of Smad trimers with the
common Smad4
Translocation of Smad complex to
the nucleus and binding to
responsive element to activate
target genes.
E.g. cAMP signalling
Binding of extracellular signal
molecule to GPCR activates
adenylyl cyclase via stimulatory G
protein and increases cAMP
concentration in cytosol
Rise in cAMP concentration
activates PKA (protein kinase A)
and the released catalytic subunits
of PKA can then enter the nucleus
where they phosphorylate CREB
(Ser-133).
Once phosphorylated, CREB
recruits the co-activator CBP
(CREB-binding protein) which
stimulates gene transcription.
cAMP is involved in a variety of
cellular activities where different
hormones are used for different cell
types but the intracellular signals
are the similar.
E.g. Adrenaline in muscle
Glycogen breakdown,
Vasopressin in kidney
water resorption.
E.g. Canonical Wnt signalling – in development and cancers
LSM2232 Genes, Genomes & Biomedical Implications
Page | 64
Wnt binding to
Frizzled receptor
(GPCR)
Recruitment of two
co-receptors,
dishevelled and LRP
Dissociation of the
inhibitory complex
Release of
unphosphorylated β-
catenin to
translocate to the
nucleus
Displacing co-
repressor Groucho to
activate target genes with TF: LEF1/TCF
Diversity of TFs in transcriptional regulation
TF dimerization helps to increase DNA binding specificity. Typically each DB
recognizes 4-6 nucleotides and thus a dimer should double the recognition length.
Heterodimer of bHLH proteins have >10 fold increase of affinity. E.g. MyoD-
E2A is 10X MyoD-MyoD
Increase functional diversity through formation of diversified protein
complexes
A single TF can control several genes by interacting with different factors
E.g. glucocorticoid receptor (GR)
coordinates expression of many
different genes. The bound proteins
are not sufficient on their own to fully
activate transcription; the GR
completes the combination of
transcription regulators required for
maximal initiation of transcription.
When the hormone is no longer
present, the GR dissociates from DNA
and the genes return to their pre-stimulated levels.
For these GR-responsive genes, the effect of GR will depend on the presence
of GR, presence of ligand, presence of other regulatory proteins and the
binding sites on the gene.
In conclusion, specific TFs contain specific domains responsible for DNA binding,
transactivation and interaction with other molecules.
Specificity of interactions results from both protein-DNA and protein-protein interactions
(with cofactors) for controlling target genes.
Transcriptional activation of target genes could be induced by a series of signal transduction
events through extracellular factors, intracellular signalling molecules and finally nuclear
transcription factors.
Lecture 16 – Chromatin Remodeling and Transcriptional Activation
LSM2232 Genes, Genomes & Biomedical Implications
Page | 65
Simplified model of eukaryotic gene transcription: Ubiquitous & cell-specific proteins bind 5’
sequence elements
Core promoter (TATA, INR, DPE)
Proximal cis-elements (GC, BLE, CCAAT)
Distal regulatory regions (enhancers, could be in intron or downstream of gene)
Updated knowledge: Involves co-factors, mediator, nucleosome modification and chromatin
remodelling.
Co-regulators:
Some TFs have both BD and AD and thus can
interact with basal transcriptional
apparatus directly while some have only BD
and no/weak AD and thus they require co-
activator to interact with basal
transcriptional apparatus directly.
Co-regulators do not bind to DNA directly
but interact with TFs or transcriptional
initiation complex.
Roles of co-activators:
Bridging TFs and PIC (pre-initiation
complex)
Helping recruitment of GTFs and RNAPII
Chemical modification of nucleosomes – covalent modification of histones
Chromatin remodelling
Eukaryotic transcriptional regulators often work in group or complexes by
interaction of specific transcription factors and co-factors in the presence of a
specific cis-element (DNA). In rare cases, RNA can act as scaffold to bring proteins
together.
The complex could function for activation or for repression.
One co-regulator may interact with different TFs.
Most of co-regulators function as either co-activators or co-repressors but some of
them can have dual functions – as an activator in one complex and a repressor in
another complex.
Co-regulators are generally more widely expressed than TFs and are involved in
regulation of higher number of genes.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 66
Mediator complex – originally defined as a protein complex tightly associated with RNAPII
CTD in yeast.
It is poorly defined in higher eukaryotes and can be considered a scaffold for
protein-protein interaction.
They also include a collection of co-regulators with
different activities for histone modifications and
chromatin remodelling.
The yeast mediator has 25 subunits (1.4MDa) in 4
modules, head, middle, tail and kinase.
In general, co-repressors can be broadly divided into 5
classes:
Class Properties Examples of co-regulators I Activator and repressor targets inherent to the core
machinery, promoter recognition and enzymatic functions
TAFs (TBP associated factors), TFIIA, NC2, PC4
II Activator and repressor adaptors, modulate DNA binding, target other co-regulators and the core machinery (bridging)
OCA-B/OBF-1, Groucho, Notch, CtBP, HCF, E1A, VP16(Herpes simplex virus TF that binds TAF through its AD)
III Multifunctional, structurally related but highly divergent co-regulators: some interact with RNAP II and/or multiple types of TFs, some have inherent enzymatic functions or chromatin-selective properties (mediator)
Yeast: Mediator SRBs human a: CRSP, PC2 Human b: ARC/DRIP/TRAP Human c: NAT, SMCC, Srb/Mediator
IV Chromatin (nucleosome) modifying activator and repressor adaptors, acetyltransferase or deacetylase activates with multiple substrates: histones, histone-relate proteins, activators, other co-regulators and the core machinery.
CBP/p300, GCN5, P/CAF, p160s (SRC1, TIF2, p/CIP, etc.), HDAC-1 and HDAC-2 (rpd3), Sir2
V ATP-dependent chromatin remodelling activities SNF2-ATPase (SWI/SNF, RSC) and ISWI-ATPase (NURF, ACF, ChrAC, RSF, etc.)
SWI/SNF: Switch/Sucrose, non-fermentable, ISMI: Imitation SWI
Chromatin Remodelling – nucleosome disruption and re-formation
LSM2232 Genes, Genomes & Biomedical Implications
Page | 67
Remodelling complex A disrupts nucleosomes to
allow DNA-binding proteins to bind and initiate
gene expression/replication.
Remodelling complex B restores nucleosome
reformation when DNA binding proteins
dissociate
1. The same remodelling complex could
perform both nucleosome disruption and
re-formation.
Nucleosomes are dynamic as they can wrap-
unwrap-rewrap in milliseconds and thus allowing
DNA to be accessible most of the time for binding
TFs.
Nucleosome positioning & re-positioning is important to influence gene
transcription.
Histone modifying enzymes and chromatin remodelling complexes work in concert
– a particular histone modification attracts a particular type of remodelling complex.
How do transcriptional activators direct local alterations in chromatin structure?
GTs and RNAP are unable to assemble on a promoter that is packaged in nucleosome
and thus activators are needed to trigger changes to the chromatin structure of the
promoters to make the DNA more accessible.
Involves transcription regulators, chromatin remodelling complex and histone
chaperone.
Four mechanisms for locally altering chromatin:
1. Nucleosome remodelling – nucleosome sliding allows access of
transcription machinery to DNA
2. Nucleosome removal – transcription machinery assembles on nucleosome
free DNA
3. Histone replacement – histone variants allow greater access to
nucleosomal DNA
4. Histone modifications – specific patterns of histone modification
destabilize compact forms of chromatin and attract components of
transcription machinery.
Covalent modification: acetylation (A), phosphorylation (P) or
methylation (M).
Only a small number of histone modifications are known for their
function – histone code hypothesis: covalent modifications of histone
tails facilitate the binding of specific proteins to chromatin to
perform distinct functions such as transcription, replication and
repair.
Histone variants are encoded by different histone genes and are
expressed at lower levels than regular histones, insertion of different
histone variants into nucleosomes may also signal different functions
including transcriptional activation. Variants are recognized by
chromatin remodelling complexes.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 68
Histone acetylation is often associated with transcription activation
and this is performed by histone acetyl transferases (HATs)
Example of CBP (co-activator) and CREB (specific TF)
When cAMP increases, PKA phosphorylates S133 in CREB and allows it to interact
with CBP which has HAT activity
CBP increases transcription rates through acetylating histone tails to remodel
chromatin and increasing recruitment rate of RNAPII to promoter
CBP and closely related p300 are cofactors for many TFs and not just CREB, but they
are not generally associated with all RNAPII genes and seem to be associated with
certain classes of genes only, often in inducible genes and those involved in cell
differentiation.
Example of successive histone modification during transcription initiation in human
interferon gene promoter
Sequential histone modifications
1. Acetylation of H3K9, H4K8
2. Phosphorylation of H3S10
3. Acetylation of H3K14
GTF TFIID and a chromatin remodelling
complex bind to the chromatin to
promote the subsequent steps of
transcription initiation. TFIID and the
remodelling complex both recognize
acetylated histone tails through a
bromodomain – a protein domain
specialized to read this particular mark
on histones.
From chromatin remodelling to formation of TIC –
transcription activators can act at different steps.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 69
Transcription activators of different mechanisms often work synergistically where a greater
than additive effect of multiple activators working together is observed.
6 ways in which eukaryotic repressor proteins can operate:
(A) Activator proteins and repressor proteins compete for binding to the same
regulatory DNA sequence
(B) Both proteins bind DNA but the repressor prevents the activator from carrying
out its function
(C) The repressor blocks assembly of the GTF
(D) Repressor recruits a chromatin remodelling complex which returns the
nucleosomal state of the promoter region to its pre-transcriptional form
(E) Repressor attracts a histone deacetylase to the promoter
(F) Repressor attracts a histone methyl transferase which methylates histones which
maintain the chromatin in a transcriptionally silent form.
Genes can be permanently switched off via methylation
Cytosine can be methylated when it is located in a CG sequence. Methylated
nucleotides prevent DNA binding for some gene regulatory proteins.
DNA methylation patterns can be faithfully inherited by maintenance methyl
transferase.
Super enhancers are composed of large clusters of enhancers densely bound with the
mediator complex, TFs and chromatin regulators
Bound proteins are responsible for diverse enhancer-related functions such as
enhancer looping, gene activation, nucleosome remodelling and histone
modification.
Generally marked by H3K27Ac modification.
In summary, co-factors are recruited by DNA binding factors and are required to help
recruit and/or stabilize binding of the PIC. They may be recruited as a result of modification
of the DNA binding factor.
Many of these co-factors contain chromatin modifying activities including the ability to
acetylate, phosphorylate or methylate histone N-terminal tails. These modification may be
written as “histone codes” which are “read” determining interacting proteins & conferring
meaning to the activity
A histone modification work with chromatin remodelling activity to allow accessibility of
GTFs and RNAPII to the promoter and thus is integral in specific gene expression.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 70
Lecture 17 – Post transcriptional regulation
Post transcriptional regulation occurs after RNAPII has begun
RNA synthesis and the type of post-transcriptional regulation
of gene expression varies from gene to gene.
RNAPII is involved in capping, splicing and polyadenylation
(coupled) where the tail containing 52 tandem repeats of a
seven-amino-acid sequence (YSPTSPS) is capped when Ser5 is
phosphorylated by TFIIH.
This ensures that the RNA molecule is efficiently
capped as soon as its 5’ end emerges from the RNAP
As the polymerase continues transcribing, the Ser2
position is phosphorylated by a kinase associated with
the elongating polymerase and is eventually
dephosphorylated at Ser5 position.
When RNAPII finishes transcribing a gene, it is
released from DNA. Soluble phosphatases remove the
phosphates on its tail and it can reinitiate transcription.
Only the fully dephosphorylated form of RNAPII is
competent to being RNA synthesis at a promoter.
mRNA Capping: 5’ to 5’ addition of guanosine monophosphate (GMP) to the 5’ end of the
RNA transcript. The capping reaction is started when RNA is synthesized to ~25nt and all
enzymes are associated with the CTD.
Capping signals the translation start site, ensures correct processing and export of
mRNA through a cap binding complex (CBC), and stabilizes and protects the 5’ end
of mRNA from degradation.
Reaction involves a phosphatase removing a phosphate from 5’ end of primary
transcript, followed by a granyl transferase adding a GMP in reverse linkage (5’ to 5’
instead of 3’ to 5’) and lastly a methyl transferase adds a methyl group to the
guanosine.
RNA Splicing: due to the presence of
intron/exon in eukaryotic genes. Both the size
and number of introns are variable from genes
to genes.
Splicing consensus sequences – 5’ GU at
donor site and 3’ CAG at acceptor site.
A branch point in the lariat which is
loose consensus, YURAC
Reaction: join the 5’ end of intron to a
branch point A to form a lariat loop, cut
the 3’ end of the intron and join the two
exons.
RNA splicing is performed by RNA molecules – U1, U2, U4, U5, U6 (<200nt each) and
these are known as snRNAs. Each of them are complexed with at least 7 protein
subunits to form a snRNP (small nuclear ribonucleoprotein) and they form the core
of the spliceosome which contains >100 proteins.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 71
The spliceosome recognizes the intron splice sites (branch point and 5’ splice site),
brings the two ends of the intron together and removes the intron.
The snRNAs provide enzyme activities and each snRNA has different roles:
U1: recognize 5’ splice site
U2: initial binding to branch site
U5: bring two exons together
U4: sequesters U6 snRNP
U6: core component, together with U2 they catalyse two phosphoryl-transfer
reactions (transesterification
The U1 RNA has several distinct stem-loop domains
Sm binding site is required for interaction with common snRNP proteins
U1 5’ end can base pair with the 5’ splice site
U1 snRNP contains 8 common core Sm proteins and 3 U1-specific proteins
(U1-70k, U1A and U1C)
Rearrangements allow the splicing signals on the pre-RNA to be examined by
snRNPs several times during the course of splicing. This allows the spliceosomes to
check and recheck to increase the overall accuracy of splicing.
Splicing errors – exon skipping and cryptic splice site selection. Cryptic splicing
signals are nucleotide sequences of RNA that closely resemble true splicing signals
and are sometimes mistakenly used by the spliceosome.
To avoid errors, couple with transcription to avoid exon skipping as the
splicing will be executed when the first 3’ splice site is transcribed before the
next 3’ splice is available.
An exon size is more of less uniform ~150nt while intron size is variable.
Exon is bound by binding of a group of SR (Ser and Arg) proteins served as splice
enhancers to recruit U1 and U2 to define 5’ and 3’ splice sites
Introns are packaged into complexes by hnRNPs (heterogeneous nuclear
ribonucleoproteins)
LSM2232 Genes, Genomes & Biomedical Implications
Page | 72
Splice-site mutations can lead to abnormal proteins and thus diseases and this is a
consequence of deletion/addition of amino acid sequence, change in reading frame
or truncated protein due to premature termination codon.
Differential RNA splicing is used to increase product diversity – estimated that 90%
of human genes produce differentially spliced transcripts.
Different protein variants can be generated by alternative splicing and thus
one gene one polypeptide.
Transcriptional termination: polyadenylation site AAUAAA which is 10 to 30nt before the
poly A tail and a GU or U rich region within 30nt of the site for poly A.
1. CstF (cleavage stimulation factor) and CPSF (cleavage
and polyadenylation specificity factor) travel with
RNAPII during transcription
2. They recognize the AAUAAA signal and the additional
cleavage factors create the 3’ end
3. Poly-A polymerase (PAP) adds ~200 A nucleotides to
the 3’ end
4. Poly-A binding protein to aid poly adenylation and
protect RNA from degradation.
A membrane-bound or secreted antibody can be
determined by differential polyadenylation.
Increase in concentration of CstF promotes
RNA cleavage.
The first cleavage site that a transcribing RNA
polymerase encounters is suboptimal and is
usually skipped in unstimulated B lymphocytes
and thus the production of a longer RNA
transcript and membrane bound antibodies.
When activated to produce antibodies, the CstF
concentration increases and cleavage now
occurs and a shorter transcript is produced, resulting in secreted antibodies.
RNA Export from the nucleus to the cytoplasm
Each RNA binds multiple proteins including the nuclear export receptor (export-
ready RNA).
Some binding proteins are co-transported and some are not.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 73
Nuclear export receptor guides the RNA to the nuclear pore complex for export and
only a small fraction of RNA (<10%) transported to cytoplasm as protein-RNA
complex.
SR: spliceosome proteins containing a domain rich in serine and arginine, CBC: cap-
binding complex, PABP: poly-A binding proteins, EJC: exon junction complex.
The failure to correctly splice a pre-mRNA often introduces a premature stop codon
into the reading frame for the protein. These abnormal mRNAs are destroyed by the
nonsense-mediated decay mechanism.
An mRNA molecule bearing EJCs to mark successfully completed splices is
first met by a ribosome that performs a “test” round of translation. As the
mRNA passes through the tight channel of the ribosome, the EJCs are
stripped off and successful mRNAs are released to undergo translation.
However if an in-frame stop codon is encountered before the final EJC is
reached, the mRNA undergoes nonsense-mediated decay which is triggered
by Upf proteins that bind to EJC.
mRNA localization – translated immediately in cytosol (most common), directed to ER for
synthesis of membrane and secreted proteins or directed to specific intracellular locations
prior to translation.
Localization is either by directed transport on cytoskeleton, random diffusion and
trapping or generalized degradation in combination with local protection by
trapping.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 74
RNA editing – two main types, either A to I (inosine) or C to U (minor).
This involves RNA editing complex with deaminase activity, recognizing a specific
target sequence and/or secondary structure.
This affects the protein sequence, RNA splicing, transport etc.
Lecture 18 – Translational and post-translational control
Translational process: in eukaryotes, proteins are synthesized in ribosomes (80S) which
make up a polysome (polyribosome).
Ribosome is made up of a large subunit (60S)
and a small subunit (40S) and consists of 60%
RNA and 40% protein.
Large subunit consists of 28S (4718bp), 5.8S
(160bp) and 5S (120bp) with 49 ribosomal
proteins
Small subunit consists of 18S (1847bp) with 33
ribosomal proteins.
It takes about 20 seconds to several minutes for
each protein synthesized.
Start codon is AUG and stop codon is UAG.
eIF4E and eIF4G are eukaryotic translation
initiation factors.
Ribosome scanning model: small ribosome subunit scans for the first AUG codon in a
favourable context. Kozak consensus ( ), bolded bases are the most
important and changes to them will cause reduction of translation efficiency by 10
fold.
Translational initiation occurs from the first AUG codon in ~90% of mRNAs.
Translation may also occur from the second of later AUG to generate
proteins with different N-terminal.
Translational initiation
1. Binding of Met-initiator tRNA to eIF2 and later to small ribosome subunit
directly at the P site
2. Binding to the cap of mRNA with additional initiation factors (eIF4E and
eIF4G)
3. Scanning for the first AUG
4. Dissociation of eIF2 and binding of large ribosome subunit
5. Addition of second amino acid tRNA
Each ribosome has three binding sites for tRNAs and one binding site for mRNA.
Cycle for amino acid addition has 4 steps:
LSM2232 Genes, Genomes & Biomedical Implications
Page | 75
1. Aminoacyl-tRNA binding to EF-1 and enters A-site
by codon-anticodon pairing. First proofreading
involves 16S rRNA which recognizes correct pairs
and closes it tightly to trigger EF-1 for GTP
hydrolysis.
2. Peptide bond formation
3. Large subunit translocation – binding of EF-2 and
GTP hydrolysis cause conformation change that
move tRNA to the P-site after formation of peptide
bond. EF-2 then dissociates
4. Small subunit translocation, empty tRNA ejected
Translational termination – binding of a release factor to
the A site when a termination codon is encountered. This
results in chain termination and release of nascent
polypeptide and the dissociation of the ribosome.
Eukaryotic releasing factors recognize all three
termination codons – UAA, UAG and UGA.
Post-translational process – involves protein folding, covalent
modifications and formation of complexes.
Incorrectly folded polypeptides with stretches of
hydrophobic amino acids on surface will eventually be
destroyed in proteosomes as they are toxic to the cell by formation of aggregates.
Protein folding is already in halfway by the time the ribosome releases the nascent
peptides. Many newly released proteins have open and flexible structures called
molten globule which is subjected to further folding.
In eukaryotes, Hsp70 and Hsp60 are major families of molecular chaperones in
eukaryotes which have affinity for exposed hydrophobic patches on incompletely
folded protein.
Hsp70 acts before translation is complete. It binds to a string of 4-5
hydrophobic amino acids, hydrolyses ATP and clamp down very tightly on
the target. It then rebinds ATP and releases target protein.
The Hsp60 acts late. It forms a large barrel-shaped structure and captures a
misfolded protein by hydrophobic interaction. It hydrolyses ATP and adds a
cap protein (GroES) to increase the dimension of the barrel rim, and
incubates for ~10seconds. Ejection of the correctly folded protein is
accomplished by ATP hydrolysis.
For protein degradation, proteosomes are used – comprises of 20S core and two
19S caps which is a complex of ~20 subunits (>6 are ATPases) which recognizes
ubiquitinated and unfolded proteins.
Proteosomes are highly abundant, constituting ~1% of cellular proteins.
Improperly folded proteins are targeted by attachment of ubiquitin (76aa).
Ubiquitinated proteins are translocated to proteosomes and ubiquitin is
removed by ubiquitin hydrolase for recycling. The targeted proteins are
unfolded in the ring of the cap and threaded into the core for degradation.
Note that a protease cuts once and doesn’t need ATP while a proteosome
cuts the entire protein multiple times into short peptides.
LSM2232 Genes, Genomes & Biomedical Implications
Page | 76
Regulation at mRNA level – different mRNA have different half-lives
Many unstable mRNA have AU rich sequence in their 3’UTRs
A cap associated enzyme, deadenylase (DAN) shortens the poly-A tail. Actively
translated mRNA tend to have longer half-lives
Some mRNAs are decayed by specialized mechanisms – ferritin and transferrin in
the presence of iron.
When iron levels are low, the binding of aconitase blocks translation of
ferritin mRNA. When iron levels are high, it will bind to aconitase and it will
dissociate away from the mRNA and begin synthesis of ferritin.
In transferrin, the binding of aconitase blocks an endonuclease cleavage site
and thus stabilizes the mRNA, allowing translation and thus the import of
iron across the plasma membrane.
RNAi and MicroRNA
RNAi is a short single-stranded RNAs (20-30nt) and is a host defence mechanism to
destroy foreign RNAs
RNAi serve as guide RNA that selectively reorganize and bind through base
pairing to other RNAs in the cell. When the target is a mature mRNA, the
RNAi can inhibit its translation or catalyse its destruction by recruiting
Argonaut proteins.
MicroRNAs (miRNA) are a newly discovered class of small RNAs (21-25nt, typically
23nt) that is transcribed by RNAPII and have cap and poly A tail.
>1000 miRNA genes in human genome are present as independent genes or
in introns.
They appear to regulate at least one-third of all human protein coding genes.
Upon export into cytoplasm, Dicer (RNAse) further cleaves/dices miRNA to
result in a single stranded mature miRNA. It then forms an RISC (RNA-
induces silencing complex) with Argonaut and other proteins. It targets
specific mRNAs based on base pairing and lead to rapid mRNA degradation.