lsm2232 genes, genomes & biomedical implications...lsm2232 genes, genomes & biomedical...

76
LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome number is determined by their size from the largest to the smallest. The gender determining chromosome is 23 with XX for females and XY for males. Chromosome Painting is a term used to describe the direct visualisation using in situ hybridisation of specific chromosomes in metaphase spreads and in interphase nuclei. Chromosome painting, coupled with fluorescence in situ hybridisation (FISH) is now used routinely to enhance the identification of chromosomal rearrangements, the assignment of breakpoints and the determination of the origin of extra chromosomal material. In humans, 99.9% of our 25,000 genes are identical. Of the genes, 50% are repetitive sequences. The human genome has about 3.2 10 9 nucleotide pairs. Progress in molecular genetics has evolved significantly since 1950s – from cells & central dogma to cytogenetics, genome landscapes, gene regulation and genome editing today. Gene silencing is an example of gene regulation where RNAi/siRNA is used to cause cleavage of targeted mRNA molecules which supresses gene expression. Alternatively morpholino oligos can be used by binding to complementary sequences of RNA/ssDNA via base paring. Morpholinos act by “steric blocking”, binding to a target sequence within RNA, inhibiting molecules which can interact with the RNA. Genome editing can be done via CRISPR-Cas9 system. CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) are bacterial loci containing short direct repeats of 24-48bp. It is a form of acquired prokaryotic immune system which confers resistance to exogenous sequences such as plasmid and phages. Cas9 enzyme acts as a pair of molecular scissors that can cut the two strands of DNA at a specific location in the genome so that bits of DNA can then be added or removed. A guide RNA (gRNA) about 20 bases long is a small piece of pre-designed RNA sequence located within a longer RNA scaffold. The scaffold part binds to DNA and the pre-designed sequences ‘guides’ Cas9 to the right part of the genome. The gRNA has bases that are complementary to that of the target DNA sequence in the genome. The Cas9 follows the guide RNA to the same location and makes a cut. When the cell detects the damaged DNA, the repair machinery can be used to introduce mutations.

Upload: others

Post on 22-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 1

Lecture 1/2/3 (Low BC Part 1)

Humans have 23 chromosomes and the chromosome number is determined by their size

from the largest to the smallest. The gender determining chromosome is 23 with XX for

females and XY for males.

Chromosome Painting is a term used to describe the direct visualisation using in situ

hybridisation of specific chromosomes in metaphase spreads and in interphase nuclei.

Chromosome painting, coupled with fluorescence in situ hybridisation (FISH) is now

used routinely to enhance the identification of chromosomal rearrangements, the

assignment of breakpoints and the determination of the origin of extra chromosomal

material.

In humans, 99.9% of our 25,000 genes are identical. Of the genes, 50% are repetitive

sequences. The human genome has about 3.2 109 nucleotide pairs.

Progress in molecular genetics has evolved significantly since 1950s – from cells & central

dogma to cytogenetics, genome landscapes, gene regulation and genome editing today.

Gene silencing is an example of gene regulation where RNAi/siRNA is used to cause cleavage

of targeted mRNA molecules which supresses gene expression.

Alternatively morpholino oligos can be used by binding to complementary

sequences of RNA/ssDNA via base paring. Morpholinos act by “steric blocking”,

binding to a target sequence within RNA, inhibiting molecules which can interact

with the RNA.

Genome editing can be done via CRISPR-Cas9 system.

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) are bacterial

loci containing short direct repeats of 24-48bp. It is a form of acquired prokaryotic

immune system which confers resistance to exogenous sequences such as plasmid

and phages.

Cas9 enzyme acts as a pair of molecular scissors that can cut the two strands of DNA

at a specific location in the genome so that bits of DNA can then be added or

removed.

A guide RNA (gRNA) about 20 bases long is a small piece of pre-designed RNA

sequence located within a longer RNA scaffold. The scaffold part binds to DNA and

the pre-designed sequences ‘guides’ Cas9 to the right part of the genome. The gRNA

has bases that are complementary to that of the target DNA sequence in the genome.

The Cas9 follows the guide RNA to the same location and makes a cut. When the cell

detects the damaged DNA, the repair machinery can be used to introduce mutations.

Page 2: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 2

Animal models such as mice or rats are used to help understand various biological

processes as they are physiologically alike with humans and share 80-85% of the gene

sequence identity with humans – similar syntenic (genes occurring on the same

chromosome) groups.

Conserved chromosomal domains may be important for chromosomal function

Introns are nucleotide sequences in a gene which are noncoding and are removed by RNA

splicing during maturation of the final RNA product.

Introns are integral to gene expression regulation. Some introns themselves encode

functional RNAs through further processing after splicing to generate noncoding

RNA molecules.

Some introns play essential roles in a wide range of gene expression regulatory

functions such as non-sense mediated decay and mRNA export.

Alternative splicing of introns within a gene acts to introduce greater variability of

protein sequences translated from a single gene, allowing multiple related proteins

to be generated from a single gene and a single precursor mRNA transcript.

Comparison between human and Fugu genes:

The Fugu has a compact genome with only 15% of repetitive DNA (vs. 50% in

humans) and the average intron length is 1/6 that of human.

The larger size of the human introns is due to the presence of retrotransposons

(LINEs/SINEs).

Close to 2000 proteins between human and Fugu are 70% similar, suggesting that

genes are highly conserved.

When sequences are highly conserved, it is likely that the function of the proteins is similar,

but the small difference in the sequence can always cause a difference in the function.

Introns and intergenic regions can produce miRNA that suppresses gene expression

CpG island/CG sites are key signature motifs in DNA that indicates a higher likelihood of

gene clusters. There is more variability of GC content and CpG density in humans than in

mouse.

Lecture 4 (Low BC Part 2)

A cell has two sets of each chromosome, one

coming from the mother and the other from the

father. The maternal and paternal chromosomes

in a homologous pair have the same genes at the

same loci, but possibly at different alleles.

Karyotyping (visualizing the number and

appearance) can be done using dyes

(chromosome painting).

Giemsa stain (mix of methylene blue and

eosin) binds to gene-poor A-T rich

regions after chromosome digestion with

trypsin and yields a series of lightly (GC)

and darkly (AT) stained bands.

Fluorescent dyes can be used to

simultaneously visualise all pairs of

chromosomes in different colours.

Page 3: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 3

Chromosomes can undergo recombination (crossing over) and translocation during meiosis

to allow for genome diversity.

For translocation, the two chromosomes need to be co-localized (in close physical

proximity) in order for it to happen.

A disease results when the gene gains a function or loses a function.

Gene insertion, deletion and duplication are other mechanisms that may cause a

rearrangement in the chromosome.

The human chromosome 22 (one of the smallest) contains 4.8 million nucleotide pairs and

makes up approximately 1.5% of the human genome.

10% of a chromosome arm contains about 40 genes while one gene would contain

about 34000 nucleotide pairs.

Genes can be located in either strand of the DNA. The top strand and bottom strand can code

for different genes, but the coding sequence of one gene will always be on one strand.

In the gene below, the top strand has fewer genes as compared to the bottom strand

and a possible reason for this is that the top strand has fewer promoter sequences

for the initiation of gene transcription.

Closely related species may have different number of chromosomes but can result in the

same expression due to having a similar number of genes.

The advantage for fewer chromosomes is that cell division would be easier as less

organization is required (microtubules)

However the disadvantage is that a single mutation event would result in a stronger

impact on the mutation (many eggs in one basket scenario).

A genome can expand due to gene duplication (polyploidy in plants) and this allows the

species to adapt to harsher conditions.

For the human genome:

Largest gene = 2.4 106 base pairs

Average gene size = 27000 base pairs

Average exon size = 145 base pairs

Average cDNA length = 1000 base pairs typical protein is about 300+ amino acids.

Average exons per gene = 8.8

Page 4: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 4

What’s in the human genome?

We are about 50% repeats while protein-coding regions are only about 1.5% of the

entire genome.

Open Reading Frame (ORF): any sequence of the DNA within the genome that possibly can

encode for a functional element.

Example of both strands of DNA having coding sequences:

ORF1: 5’– GGC CTT ACG TTA TTA CCC –3’

ORF1: 3’– CCG GAA TGC AAT AAT GGG –5’

ORF2: 5’– GG CCT TAC GTT ATT ACC C –3’ Stop codon encountered

ORF2: 3’– CC GGA ATG CAA TAA TGG G –5’

ORF3: 5’– G GCC TTA CGT TAT TAC CC –3’

ORF3: 3’– C CGG AAT GCA ATA ATG GG –5’

Stop codons are: UAA, UAG, UGA and thus the probability of hitting stop is 3/64.

For viral genomes which are compact, both strands on the same loci can code for

different proteins based on the reading frame.

Lecture 5/6 (Low BC Part 3)

DNA organization and packing happens during DNA expression and DNA replication.

The DNA is most tightly packed during cell division.

The origin of replication initiate replication bubbles.

In bacteria and yeast, the origin of replication has been identified and show

sequence-specific activation.

In mammalian cells, the sequences are highly variable.

Origins of replication are clustered in groups of 20-80 called replication units

irregularly.

30-300kb intervals separate individual origins within each replication unit.

Replication units activate during S phase.

DNA-binding proteins (histones) package DNA into a compact and less fragile form called

chromatin which is DNA complexed with proteins.

Page 5: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 5

Eukaryotic chromosomes are linear with several origin of replication and has 4 levels of

chromosome organization:

1. Primary structure – nucleosomal packets of 11nm (beads on a string), composed of

double stranded DNA (146bp) wrapped around an octamer of histone proteins (2 of

each H2A, H2B, H3 and H4)

Histone proteins are basic and thus positively charged. Amino acids such as

lysine and arginine can form H-bonds with the phosphates along the DNA

backbone.

All four histones share a structural motif known as the histone fold formed

from 3 alpha helices connected by two loops.

The histone fold first bind to each other to form H3-H4 and H2A-H2B dimers,

then the H3-H4 dimers combine to form tetramer before further combining

with two H2A-H2B dimers to form an octamer.

Genes that code for these histone proteins are paralogs where there is a

conserved domain.

2. Secondary structure – organization of nucleosomes to form 30nm fiber (active

euchromatin)

A single histone H1 molecule binds to each nucleosome, contacting both the

DNA and protein The H1 histones package the nucleosomes into even tighter

arrays by guiding DNA entry and exit from complex and by neutralizing DNA

charge.

Histone tails are largely unstructured and are thought to be involved in the

interactions between nucleosomes that help to pack them together.

Tails on the histones can be modified (methyl, acetyl, phosphate, ubiquitin)

for specific purposes.

Page 6: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 6

3. Tertiary structure – radial loop/solenoids (300nm) formed from the interaction

between 30nm fiber and nuclear matrix.

Euchromatin is transcriptionally inactive when in this form.

4. Global structure – higher order of packing in interphase chromosomes. Poorly

understood but there are rare visible examples:

Lampbrush – chromosomes in interphase

cells. Most of the genes present in the DNA

loops are being actively expressed while the

majority of the DNA is remains highly

condensed on the chromosome axis and are

not expressed.

Polytene cells come from the fruit fly

Drosophila and have increased numbers of

standard chromosomes. They are found in

the salivary glands of fly larvae where the

cells undergo multiple cycles of

DNA synthesis with cell division.

Multiple copies of the genes are

held side-by-side. When viewed

under a microscope, distinct

alternating dark bands and light

interbands are visible. About 95%

of the DNA is in bands and 5% is

in interbands. The chromatin in

each band appears dark because

the DNA is more condensed than

the DNA in interbands. Gene expression is likely to be more active in the

interband.

Chromatin Domains:

Heterochromatin (700nm) – highly condensed chromatin which normally does not

harbour genes and is transcriptionally inactive (~10% of DNA).

Mostly in centromeres and telomeres.

Provides protection against “parasitic” mobile elements.

Active Euchromatin (30nm) – least condensed, transcriptionally active chromatin

(~10% of DNA)

Inactive Euchromatin (300nm) – intermediate compaction form, transcriptionally

inactive.

An interphase chromosome below is shown folded into a series of looped domains each

containing about 50,000 to 200,000 or more nucleotide pairs of double-helical DNA

condensed into a chromatin fiber.

The chromatin in each individual loop is further condensed through poorly

understood folding processes that are reversed when the cell requires direct access

to the DNA packaged in the loop.

In mitotic chromosomes, the bases of the chromosomal loops are enriched both in

condensins (binds to chromosomes and compact the radial loops) and DNA

topoisomerase II (preventing DNA tangling) which form the axis at metaphase.

Page 7: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 7

Through chromosome breakage and re-joining, a piece of chromosome that is normally

euchromatic can be translocated into the neighbourhood of heterochromatin and this can

cause silencing/inactivation of normally active genes – this is known as position effect.

Heterochromatin packaging can also retract to allow expression of newly released genes.

Position effect variegation is the diversifying of phenotype in generically identical

cells and is dependent on a gene’s neighbouring heterochromatin status.

The white gene in the

Drosophila controls eye pigment

production. Wild-type have normal

pigment production which gives them red

eyes, but if the White gene is mutated and

inactivated, the mutant has white eyes.

In flies in which a normal

White gene has been moved near a region

of heterochromatin, the eyes have both

red and white patches as the gene has

been silenced by the heterochromatin.

The centromere contains heterochromatin

consisting of short, repeated DNA sequences

known as alpha satellite DNA which are AT rich.

The repeats contain slight sequence variations

and are flanked by heterochromatin made of

non-satellite repeats.

Regular replication machinery cannot fully elongate the end of a linear chromosome and

thus telomerase is required to extend the end of a chromosome such that no crucial gene

sequences are lost during replication.

Telomerase recognizes the tip of an existing telomere DNA repeat sequence and

elongates it in the 5’ to 3’ direction, using an RNA template that is a component of

the enzyme itself to synthesize new copies of the repeat.

Telomeres end in t-loops in which the protruding end of the telomere loops back

and tucks itself into the duplex DNA of the telomere repeat sequence.

Page 8: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 8

In heterochromatin, the histone tails are under-acetylated and this allows silent information

regulator (Sir) proteins to bind to these histones.

In the yeast telomere model, the telomeric proteins attract the NAD+ dependent histone

deacetylase Sir2 which silences transcription at the telomeres.

Telomeres are important for protecting chromosomal content and only germ cells

(egg/sperm) have telomerase activity. Somatic cells experience telomere shortening from

DNA replication.

Genes related by DNA sequence likely arose from gene duplications/shuffling/mutations

and they are called gene families. The release from selective pressure allowed mutations to

accumulate and later gene product/function.

Homolog genes are genes related to a second gene by descent from a common

ancestral DNA sequence. The term homolog may apply to both orthologs and

paralogs.

Orthologs are genes in different species that evolved from a common ancestral gene

by speciation. Normally orthologs retain the same function in the course of

evolution. The identification of orthologs is critical for reliable prediction of gene

function in newly sequence genomes.

Paralogs are genes related by duplication within a genome. Orthologs retain the

same function in the course of evolution whereas paralogs evolve new functions

even if these are related to the original one.

E.g. Gene coding for myoglobin and haemoglobin are paralogs but the gene coding

for haemoglobin in humans and dogs are orthologous.

Page 9: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 9

Other genome elements in the human genome include:

Retroviral-like elements (retrotransposons)

E.g. promoter in long terminal repeat (LTR)

Encodes reverse transcriptase

DNA-only transposons which encodes transposase – enzyme that binds to the end of

a transposon and catalyses the movement of the transposon to another part of the

genome by a cut and paste mechanism or replicative transposition (copy and paste)

mechanism.

Duplications, simple repeats and gene regulatory elements.

Non-retroviral retrotransposons:

Long Interspersed Nuclear Elements

(LINEs) such as L1 (~1000-12000bp) and they encode for

endonucleases and reverse transcriptase.

Transposition of the L1 element begins

when an endonuclease attached to the L1 reverse transcriptase

and the L1 RNA nick the target DNA at the point at which the

insertion will occur. RNase H then removes the RNA. This

cleavage releases a 3’-OH DNA end in the target DNA which is

then used as a primer for the reverse transcription step. This

generates a single strand DNA copy of the element that is directly

linked to the target DNA. IN subsequent reactions, further

processing of the single-strand DNA copy results in the generation

of a new dsDNA copy of the L1 element that is inserted at the site

of the initial nick via DNA polymerase.

Short Interspersed Nuclear Elements

(SINEs) such as Alu (~300bp) which do not carry their own

endonuclease or reverse transcriptase gene.

The organization structure of the LINE/SINE

elements is as follow: +1: transcription start site; pol II/III: RNA

polymerase II and III promoters; R-EN: restriction-like

endonuclease; AP-EN:

apurinic/apyrimidinic

endonuclease; pA:

polyadenylation signal

lacking downstream

efficiency element; RT:

reverse transcriptase.

There are ~850,000 LINEs (21% of genome) and ~1,500,000 SINEs (13% of genome) but

most are non-functional.

Page 10: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 10

Due to reverse transcriptase/endonuclease binding to normal cellular mRNA, LINEs/SINEs

can be said to be retropseudogenes.

LINES are strongly biased towards AT rich regions while SINEs are strongly biased

towards GC rich regions.

Mitochondria in animal cells and plastids in plant cells are organelles that contain their own

genomes.

They encode genes for their own use but also import products produced by nuclear

genes. There are differences between their genetic code and that of nuclear DNA.

Mitochondrial DNA (mtDNA) is mostly circular but some are linear. Similar to

bacterial DNA, they do not have histones. In mammals, mtDNA is about 16.5kb and is

maternally inherited.

Compared to nuclear/chloroplast/bacterial genomes, mitochondrial genome has several

surprising features:

Dense gene packing: the mitochondrial genome seems to contain almost no

noncoding DNA: nearly every nucleotide seems to be part of a coding sequence,

either for a protein or for rRNA/tRNA.

Relaxed codon usage: only 22 tRNAs are required for mitochondrial protein

synthesis compared to 30+ in the cytosol and chloroplasts.

Variant genetic code: 4 of the 64 codons have different “meanings” from those of the

same codons in other genomes.

It is thought that eukaryotic cells

originated through a symbiotic relationship

between an archaeon and an aerobic bacterium

where the archaeon provided the nucleus and the

bacterium serving as a respiring ATP-producing

endosymbiont which eventually evolved into the

mitochondrion.

Mitochondria DNA has a higher rate of

mutations due to the generation of free radicals

due to oxidative reactions and minimal DNA

repair system.

Page 11: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 11

Lecture 7 - Prokaryotic Transcription

Enzymes that perform transcription are called RNA polymerases. RNA polymerases

catalyse the formation of the phosphodiester bonds that link the nucleotides together to

form a linear chain.

RNA polymerase moves stepwise along the DNA, unwinding the DNA helix just

ahead of the active site for polymerization to expose a new region of the template

strand for complementary base pairing.

Thus transcription is 5’ to 3’ on a template that is 3’ to 5’.

The coding strand is the DNA strand that has the same sequence as the mRNA and is

related by the genetic code to the protein sequence that it represents.

The transcription unit is the sequence between sites of initiation and termination by RNA

polymerase and it may include more than one gene. The elements are:

Promoter – region of DNA where RNA polymerase binds tightly to initiate

transcription.

Terminator – sequence of DNA that causes RNA polymerase to terminate

transcription. For most bacterial genes, a termination signal consists of a string of A-

T nucleotide pairs preceded by a twofold symmetric DNA sequence which when

transcribed into RNA, folds into a “hairpin” structure. The formation of the hairpin

helps to disengage the RNA transcript from the active site.

Startpoint (+1) – position on DNA corresponding to the first base incorporated into

RNA.

Upon binding to the promoter, the RNA polymerase opens up the double helix to expose a

short stretch of nucleotides (~10) on each strand in a transient transcription bubble (~12

to 14bp) and uses the template strand (3’ to 5’) to synthesize a complementary sequence of

RNA running 5’ to 3’ (~8 to 9bp within bubble).

As transcription bubble progresses, DNA duplex reforms and displaces the RNA in a

form of a single polynucleotide chain

Transcription rate is about 40 to 50 nucleotides per second; DNA replication rate is

about 800 base pairs per second.

A nascent RNA is an RNA chain that is still being synthesized such that its 3’ end is

paired with DNA where the RNA polymerase is elongating.

In bacteria, all RNA molecules are synthesized by a single type of RNA polymerase and thus

this applies to the production of mRNA as well as structural and catalytic RNAs.

Page 12: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 12

Steps for the transcription reaction:

1. RNA polymerase binds to the promoter on the DNA to form a closed complex.

2. RNA polymerase initiates transcription (initiation) after opening the DNA duplex to

form a transcription bubble (open complex).

3. During elongation, the transcription bubble moves along DNA and the RNA chain is

extended in the 5’ to 3’ direction.

4. Transcription stops when it encounters a terminator sequence.

5. DNA duplex reforms and RNA polymerase dissociate to release newly synthesized

RNA.

The bacterial RNA polymerase consists of the core enzyme (~5000kDa) comprising of five

subunits, α2ββ’ω and a sigma (σ) factor. The association of the core enzyme and sigma

factor is referred to as RNA polymerase holoenzyme.

The two α subunits serves as a scaffold for assembly of the holoenzyme and binding

to DNA, interacts with promoter and some regulatory factors through its C-terminal

domain (CTD)

The β subunits catalyse the covalent linkages between adjacent ribonucleotides and

make up most of the enzyme mass.

The sigma (σ) factor changes the DNA-binding properties of RNA polymerase so

that its affinity for general DNA is reduced and its affinity for promoters is

increased.

The sigma factor is involved in only the initiation step.

The initiation complex contacts from

the -55 to +20 regions. When

initiation succeeds, the initial RNA

synthesis (abortive initiation) is

relatively inefficient as short,

unproductive transcripts are often

released.

However once the nascent RNA

chains reaches 8-9 bases in length,

the sigma factor is released and the

RNA polymerase transit to elongation

ternary complex of core RP-DNA-

nascent RNA.

Upon dissociation of the sigma factor

(-30 region), the core enzyme

contracts and the polymerase

tightens around the DNA, shifting to

the elongation mode of RNA

synthesis when the RNA chain

extends to 15-20 bases.

The sigma factor and the core

enzyme recycle at different points in

transcription.

Promoter clearance time (1-2 seconds) is how long it takes the current polymerase to

leave the promoter so that another promoter can initiate.

Page 13: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 13

The sigma factor changes its structure to expose its

DNA-binding regions when it associates with the core

enzyme.

The N-terminus of sigma blocks the DNA binding

regions from binding to DNA.

The sigma factor binds to both the -35 and -10

sequences which are the interactions with the

promoter.

Consensus sequence: an idealized sequence in which each position represents the base

most found when many actual sequences are compared.

The promoter consensus sequences consist of a purine (A/G) at the startpoint (+1),

the hexamer TATAAT centred at -10 and another hexamer TTGACA centred at -35.

This consensus sequence is derived from alignment of >300 E.coli promoter

regions.

Individual promoters usually differ from the consensus at one or more positions,

and promoters are asymmetrical.

Between the two promoter elements (-35 and -10), the spacing between (15 to 19

bp) is critical for its function.

The promoter efficiencies can be increased or decreased by mutation:

Mutations in the -35 sequence can affect initial binding of the RNA polymerase.

Mutations in the -10 sequence usually affect the melting reaction that converts a

closed to an open complex.

Mutations at the initial transcribed region (+1 to +20) influences the rate at which

the RNA polymerase clears the promoter.

E. coli has 7 sigma factors, each of which causes RNA polymerase to initiate at a set of

promoters defined by specific -35 and -10 sequences.

Other sigma factors are activated by special conditions and they recognize

promoters with different consensus sequences.

Substitution of sigma factors may control initiation:

70 is used for general transcription.

A cascade of sigma factors is created when one sigma factor is required to transcribe

the gene coding for the next sigma factor

Substitution of sigma factor causes enzyme to recognize a different set of promoters

with different consensus sequences.

Page 14: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 14

The termination of transcription may require the recognition of both the terminator

sequence in DNA and the formation of a hairpin structure in the RNA product.

The terminator sequence is located before the point at which the last base is added

to the RNA.

Antitermination causes the enzyme to continue transcription past the terminator

sequence – this event is called readthrough.

Intrinsic termination: termination at certain sites in the absence

of any other factors.

Intrinsic terminators consist of a GC rich hairpin in the

RNA product followed by a U-rich region in which the

termination occurs.

They also include palindromic regions that can form

hairpins varying in length from 7 to 20 base pairs.

The following sequence are the consensus sequences for E. coli for

the coding strand:

Rho-dependent termination:

The protein functions as a helicase,

binding at the rut (rho utilization site) site

(upstream from terminator) on the RNA

after the rut site is synthesized in the RNA.

At the terminator site, the DNA encodes

an RNA sequence containing several GC

base pairs that form a stem-loop

structure that binds to RNA

polymerase which results in a

conformational change that cause RNA

polymerase to pause.

The ρ protein is now able to catch up to

the stem-loop, pass through it and break

the hydrogen bonds between the DNA

and RNA within the open complex.

Page 15: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 15

Transcription and translation occur simultaneously in bacteria (coupled

transcription/translation) as ribosomes begin translating an mRNA before its synthesis has

been completed. In addition, the mRNA is also degraded simultaneously in bacteria.

Half-life of bacterial mRNA is only a few minutes and thus unstable.

In eukaryotic cells, synthesis and maturation of mRNA occurs in the nucleus. The

mRNA is then exported to the cytoplasm where it is translated by ribosome. A

typical eukaryotic RNA is relatively stable and can be continued to be translated for

several hours.

Untranslated regions include:

5’ UTR – sequence upstream from the coding region of mRNA

3’ UTR – sequence downstream from coding region of mRNA.

Bacterial mRNA may be polycistronic (have several coding regions that represent different

cistrons; code for different proteins)

Intercistronic distance may vary from -1 to +40 bases.

Termination is prevented when antitermination proteins act on RNA polymerase to read

through a specific terminator.

The location of the antiterminator site vary – can be in the promoter or within the

transcription unit.

The site where an antiterminator protein acts is upstream of the terminator site in

the transcription unit.

Phage lambda has two antitermination proteins, pN and pQ which act on different

transcription units.

Page 16: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 16

DNase footprinting can be used to study DNA-protein interaction:

A bound DNA-binding protein blocks the phosphodiester bonds from attack by

nuclease or chemicals, thus revealing the protein precise recognition site as a

protected zone/footprint.

Electrophoresis Mobility Shift Assay (EMSA) is used to study

protein-DNA or protein-RNA interactions.

A mobility shift assay is an electrophoretic separation of a

protein-DNA/RNA mixture on a gel. The speed at which

different molecules move is determined by their size and

charge.

The control lane (DNA probe without protein present) will

contain a single band corresponding to unbound DNA/RNA.

The larger the bound protein, the greater the retardation of

the DNA molecule.

Example: After purification of the ENO1 promoter binding proteins, the authors carried out an

electrophoretic mobility shift assay. Based on the description in the Figure 1 legend, predict the

EMSA result on panel B.

A schematic diagram showing the experimental strategy devised for the purification of ENO1 promoter binding proteins. B) Electrophoretic mobility shift assays (EMSA) showing DNA-protein complexes using the biotinylated DNA sequence corresponding to the ENO1 promoter and total nuclear extract of tachyzoites. Lane 1, unbound biotinylated probe alone. Lane 2, gel shift binding assays revealing the biotinylated DNA-protein complexes. Lane 3, specific competitor corresponding to unlabeled ENO1 promoter introduced simultaneously with labelled probe during binding assays.

Page 17: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 17

Lecture 8 - Operon

Regulator gene is a gene that codes for a product that controls the expression of other genes.

An operon is a unit of bacterial gene expression and regulation which includes structural

genes and control elements in DNA recognized by regulatory gene product(s).

A trans-acting product can function on any copy of its target DNA and is likely to be a

diffusible protein while a cis-acting site affects the activity only of sequences on its own

molecule of DNA.

In negative control, a repressor protein binds to an operator to prevent a gene from being

expressed while for positive control a transcription factor (activator) is required to bind at

the promoter in order to enable RNA polymerase to initiate transcription.

In inducible regulation, the gene is regulated by the presence of an inducer (substrate)

while in repressible regulation the gene is regulated by a repressor which is usually the

product of its enzyme pathway.

E.g. the tryptophan operon consists of a single promoter and 5 genes which encode different

enzymes needed to synthesize tryptophan from simpler molecules. When tryptophan inside

a bacterium is low, RNA polymerase binds to the promoter and transcribes the 5 genes.

However if tryptophan concentration is high, it binds to the repressor protein (allosteric)

and it becomes active, blocking the binding of RNA polymerase to the promoter by binding

to the promoter cis-regulatory, repressible regulation and negative control.

The level of response for a system in the absence of a stimulus is its basal level – basal level

of transcription of a gene is the level that occurs in the absence of any specific activation.

The derepressed state describes a gene that is turned on because a small molecule

corepressor is absent while a super-repressed is a mutant condition in which a repressible

operon cannot be derepressed, so it is always turned off.

Genes coding for proteins that function in the same pathway may be located adjacent to one

another (organized into operons) and controlled as a single unit that is transcribed into a

polycistronic mRNA.

The lac Operon in E. coli is controlled by both the Lac repressor and the catabolite repressor

protein (CRP) which is an activator (CRP has to bind cAMP before it can bind to promoter)

When glucose is no longer available, the intracellular cAMP concentration increases

and thus CAP gets activated, activating the lac Operon and thus allowing the bacteria

to digest other sugars.

The lacI gene has its own promoter and terminator while the transcription of the lacZYA

operon is controlled by a repressor protein (lac repressor) that binds to an operator that

overlaps the promoter at the start of the cluster (PO).

Page 18: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 18

The operator O occupies the first 26bp

of the transcription unit. The long lacZ

gene starts at base 39 and is followed

by the lacY and lacA genes and a

terminator.

The repressor protein which binds to

the operator is a tetramer of the

identical subunits coded by the lacI

gene.

The lac operon is negatively inducible

where β-galactoside, the substrate of

the lac operon is its inducer.

In the absence of β-galactosides,

the lac operon is expressed at a

very low (basal) level.

The addition of specific β-

galactosides induces

transcription of all three genes of

the lac operon.

As the lac mRNA is extremely

unstable, induction can be rapidly

reversed.

Transcription level increases

upon addition of inducer and thus

level of mRNA increases

exponentially. With the removal

of inducer, the mRNA would

quickly degrade but the level of β-

galactosidase remains high as proteins don’t degrade as fast as mRNA.

The lac repressor protein is a tetramer of the identical subunits coded by the lacI gene. It

has two binding sites – one for operator DNA and another for inducer.

The natural inducer is 1,6-allolactose (converted from lactose) which can be

metabolised and does not persist in the

cell.

A gratuitous inducer resembles

authentic inducers of transcription but

they are not substrates for the induced

enzyme and thus it cannot be

metabolised. E.g. isopropyl β-D-1-

thiogalactopyranoside (IPTG)

Lactose can be hydrolysed into

galactose and glucose by β-

galactosidase.

The inducer binds to the lac repressor

(allosteric) and converts it into a form

with lower operator affinity, thus allowing RNA polymerase to initiate transcription.

Page 19: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 19

Repressor can be divided into N-terminal DNA binding domain, hinge and protein

core.

DNA-binding domain contains two short -helical regions with helix-turn-

helix (HTH) motif that bind the major groove of DNA.

The inducer-binding site and regions responsible for multimerization are

located in the core.

HTH DNA binding proteins bind DNA as dimers with separation of 3.4nm

Mutations in the operator (Oc) cause constitutive expression of all three lac structural genes

because the repressor is unable to bind to the mutant operator, thus allowing RNA

polymerase to have unrestrained access to the promoter. As the operator can only control

the lac genes adjacent to it, these mutations are cis-acting as they only affect those genes on

the contiguous stretch of DNA. Oc can be said to be cis-dominant.

Mutations that inactivate the lacI gene (codes for repressor) cause the operon to be

constitutively expressed because the mutant repressor protein cannot bind to the

operator. The lacI- mutation is recessive as the indication of a normal lacI+ gene can restore

control even in the presence of a defective lacI- gene.

Mutations in the inducer-binding site of the repressor (lacIs – super suppressor) allow the

repressor to bind to the operator and prevent lac operon transcription uninducibility.

Mutations in the DNA-binding site of the repressor (lacld – dominant) are constitutive as

the repressor cannot bind to the operator.

This mutant gene makes a monomer that has a damaged DNA binding site. When it is

present in the same cell with the wild-type gene, multimeric repressors are

assembled at random from both types of subunits function can be interfered.

Only one subunit of the multimer needs to be of the lacld type to block the repressor

function and thus the mutation has a dominant negative behaviour.

The lacI promoter as an operator consisting of a palindromic

sequence of 26 base pairs (sequence that reads the same on each

strand when the strand is read in the 5’ to 3’ direction) consisting

of adjacent inverted repeats. Each inverted repeat of the operator

binds to the DNA-binding site of one repressor subunit.

The inducer binding causes a change in repressor

conformation that reduces its affinity for DNA and

releases it from the operator.

Page 20: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 20

Two symmetrical half sites in the regulatory sequences shows that it is bound by dimeric

regulatory proteins.

To determine the bases that contact the repressor (contact sites) and constitutive mutations,

chemical crosslinking or experiments can be performed to see whether modification

prevents binding.

Constitutive mutations occur at 8 positions in the operator between +5 and +17.

In order to examine the lac+ phenotype, the E. coli can be grown on a plate with nutrient

agar containing IPTG and X-gal while the control agar plate should only contain nutrients

and X-gal.

As β-galactosidase is produced in the lac+ phenotype, X-gal would be cleaved to

produce a blue reaction product.

A full repression of the lac operon would require the lac repressor to bind to O1 (highest

affinity) and either O2 or O3 operators.

Page 21: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 21

The second layer of control lies in catabolic repression (ability of glucose to prevent the

expression of a number of genes) where cAMP and CRP binds to a target sequence at a

promoter.

Secondary messenger cAMP converts CRP to a form that binds the promoter and

assists RNA polymerase in initiating transcription.

When glucose level is low, cAMP is produced which activates a dimer of CRP. The

CRP interacts with the C-terminal domain (CTD) of the subunit of RNA polymerase

to activate it.

The lac operon is under both positive and negative control.

1. In the presence of both

glucose and lactose, β-

galactosidase is not

needed and thus the lac

operon is off. The

presence of glucose

causes low levels of cAMP

and thus CRP doesn’t bind.

2. If glucose is the sole

carbon source, β-

galactosidase is not

needed and thus the lac

operon is off. Repressors

bind to the operator and

CAP fails to bind.

3. When glucose and lactose is absent, β-galactosidase is not needed. The operon is off

as the lac repressor bound to the operator prevents CRP from turning the lac operon

on.

4. If lactose is the sole carbon source, β-galactosidase is needed. CRP binds and turns

the lac operon on, producing β-galactosidase to breakdown lactose into glucose.

Page 22: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 22

In summary:

Regulatory proteins lacI Gene for lac repressor (lacR) which represses the lac operon

CAP/CRP Gene for catabolite activator protein which cAMP binds to and

activates the lac operon

Regulatory DNA sequences

lacC CAP/CRP binding site lacP Lac promoter which is CAP/CRP dependent lacO Lac operator which is lacR binding site

The genes of the lac

operon

lacZ Gene for β-galactosidase which stains colonies blue lacY Gene for lac permease which transport lactose lacA Gene for lac transacetylase

Mutations in the lac

operon Affected function -IPTG +IPTG

lacI- lacR mutant, inactive, recessive – defective

repressor + +

lacI-d lacR mutant, cannot bind DNA, dominant-negative –

defective repressor + +

lacIS lacR mutant, cannot bind to inducer, super

repressor, uninducible - -

CAP- CAP mutant, inactive – cAMP cannot bind - -

CAPC CAP mutant, constitutive, cAMP-independent – CAP

binds + +

lacC- CAP binding site mutant, cannot be bound by cAMP-

CAP complex. Results in defective CAP binding site or reduced CAP binding

- -/+

weak

lacP- Lac promoter mutant, inactive – defective

promoter - -

lacOC Lac operator mutant, cannot be bound by lacR, constitutive expression – defective operator

+ +

lacZ- β-galactosidase mutant, inactive - - Wild Type None - + lacO- lacIS Defective operator + super repressor + +

Apart from measuring the transcription level directly via quantitative PCR (qPCR), a

reporter assay or the activity of beta-galactosidase can be used to measure the activity of the

promoter.

1. In a reporter assay, a reporter gene encoding for an easy to measure protein such as

GFP or luciferase is added after the gene’s promoter and the amount of the reporter

protein can be easily measured.

Page 23: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 23

Lecture 9 – Phage Lambda

A virus consists of a nucleic acid genome contained in a protein coat. In order to reproduce,

the virus must infect a host cell. The typical pattern of an infection is to subvert the

functions of the host cell for the purpose of producing a large number of progeny viruses.

1. The lytic cycle is the infection

of a bacterium by a phage that

ends in the destruction of the

bacterium with the release of

progeny phase.

2. A prophage is a phage genome

covalently integrated as a

linear part of the bacterial

chromosome.

3. The ability of a phage to

survive in a bacterium as a

stable prophage component of

the bacterial genome is known

as lysogeny.

For virulent phages, they undergo the

lytic cycle only but for temperate

phages, they can choose between a

lytic and lysogenic pathway of

development.

Induction is the process when a

prophage is freed from the restrictions

of lysogeny, resulting in the destruction of the lysogenic repressor and the excision of free

phage DNA from the bacterial chromosome.

Immunity is the ability of a prophage to prevent another phage of the same type from

infecting another cell.

Lytic development is accomplished by a pathway in which the phage genes are expressed in

a particular order and this ensures that the right amount of each component is present at

the appropriate time. There are two parts to the cycle:

1. Early infection describes the period from entry of the DNA to the start of its

replication.

Early phase is devoted to the production of enzymes involved in the

reproduction of DNA.

2. Late infection defines the period from the start of replication to the final step of

lysing the bacterial cell to release progeny phase particles.

Protein components of the phage particle are synthesized such as the head,

tail and assembly proteins.

DNA replication reaches its maximum rate and it gets packaged into the

heads.

Lytic development is controlled by a regulatory cascade (sequence of events, each of which

is stimulated by the previous one)

Lytic cycle is under positive control so that each group of phage genes can be

expressed only when an appropriate signal is given.

Page 24: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 24

Early part of the first stage of gene expression

necessarily relies on the transcription apparatus

of the host cell and only a few genes are

expressed at this time.

In phage lambda they are called

immediate early genes.

One of these genes always code for a

protein – a gene regulator that is

necessary for transcription of the next

class of genes.

Next class of genes are known as the delayed

early or middle gene group. Its expression

typically starts as soon as the regulator protein

coded by the early gene is available.

If control is at transcription initiation,

then the two events are independent and

early genes can be switched off when

middle genes are transcribed.

If control is at transcription termination,

the early genes must continue to be

expressed.

Often the expression of host genes is

reduced the two sets of early genes

account for all necessary phage functions except those needed to assemble the

particle coat itself and those to lyse the cell.

When the replication of phage DNA begins, the late genes are expressed. This is arranged

by embedding an additional regulator gene within the previous set of genes. This regulator

may be another antitermination factor or another sigma factor.

Page 25: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 25

The means used to construct each phage

cascade are different, but the results are

similar.

Two mechanisms for recognizing new phage

promoters:

1. Replace sigma factor of host enzyme

with another factor that redirects its

specificity in initiation or synthesize a

new phage RNA polymerase

The critical feature that

distinguishes the new set of

genes is their possession of

different promoters from

those originally recognized by

host RNA polymerase.

2. Antitermination provides an

alternative mechanism for phages to

control the switch from early genes to

the next stage of expression.

The same promoters continue to be recognized by RNA polymerase but the

new genes are expressed only by extending the RNA chain to form molecules

that contain the early gene sequences at the 5’ end and the new gene

sequences at the 3’ end.

From the genetic point of view, the mechanisms of new initiation and antitermination are

similar where both are positive controls in which an early gene product must be made by

the phage in order to express the next set of genes.

By employing either sigma factor or antitermination proteins with different

specifications, a cascade for gene expression can be constructed.

Genes concerned with related functions are often clustered. In phage T7, the genome

consists of three classes of genes which codes three classes of genes that are expressed

sequentially:

Class I: RNA polymerase + enzymes that interfere with host gene expression.

Class II: enzymes for DNA synthesis and lysozyme

Class III: Head and tail proteins.

When lambda DNA enters a host, the lytic and lysogenic pathways start off the same where

expressions of the immediate early and delayed early genes are required.

Lytic development follows if the late genes are expressed

Lysogeny ensues if synthesis of a gene regulator called the lambda repressor is

established by turning on its gene – cI gene.

Lambda has two immediate early genes, N and cro which are transcribed by host RNA

polymerase.

N gene codes for an antitermination factor whose action at nut (N utilization sites)

allow transcription to proceed into delayed early genes.

The cro gene codes for a repressor that prevents expression of the cI gene which

codes for the lambda repressor depressing the late genes and turns off expression of

the immediate early genes.

Page 26: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 26

Three of the delayed early gene products are

regulators (cII, cIII and Q).

The cII-cIII pair of regulator genes is needed

to establish synthesis of lambda

repressor for lysogenic pathway.

The Q regulator gene codes for an

antitermination factor that allows host

RNA polymerase to transcribe the late

genes and is necessary for the lytic cycle.

The lytic cycle depends on antitermination by

pN which allows RNA polymerase to continue

transcription past the ends of the two immediate

early genes.

N is transcribed toward the left using PL

while cro is transcribed toward the right

using PR.

The synthesis of the N protein

(antiterminator pN) allows RNA

polymerase to pass the terminators tL1 to

the left and tR1 to the right into 7

recombination (left) and 2 replication genes

(right).

pQ is the product of a delayed early gene and is an antiterminator that allows RNA

polymerase initiating at PR to transcribe the late genes

Lambda DNA circularizes after infection and as a result the late genes form a single

transcription unit.

Page 27: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 27

Lysogeny is maintained by the Lambda Repressor

Protein encoded by the cI gene.

The cI gene has two promoters, PRM (promoter

right maintenance) and PRE (promoter right

establishment). Mutants in this gene cannot

maintain lysogeny and always enter the lytic

cycle.

The lambda repressor acts at the OL and OR operators to

block transcription of the immediate early genes (N

and cro).

At OL the lambda repressor prevents RNA

polymerase from initiating transcription at PL.

This stops the expression of gene N which

prevents the expression of pN and thus the lytic cycle is blocked.

The lambda repressor binding at OR also stimulates transcription of cI, its own gene

from PRM

As long as the level of lambda repressor is adequate, there is continued expression of the cI

gene and this result in OL and OR being occupied indefinitely lysogeny is stable and lytic

cascade is repressed.

Immunity in phages refers to the ability of a prophage to prevent another phage of the

same type from infecting a cell.

When a second lambda phage DNA enters a lysogenic cell, repressor protein

synthesized from the resident prophage genome will immediately bind to OL and OR

in the new genome, preventing the second phage from entering the lytic cycle

A lysogenic phage confers immunity to further infection by any other phage with the

same immunity region.

In the absence of repressor, RNA polymerase can bind to PL and PR which starts the

lytic cycle. It cannot initiate at PRM in the absence of the repressor.

Virulent mutations prevent the repressor from binding at OL and OR and thus

lysogeny is unable to be established.

Counting phages via Serial dilution method:

Starting with an unknown concentration, perform a serial dilution and spread each

concentration on a plate.

As phage grow and lyse the host, plaques are formed in the bacterial lawn. By

counting the number of plaques on the lawns, the original concentration can be

determined.

Page 28: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 28

With wild-type phages, plaques are turbid or cloudy as they contain cells that have

established lysogeny instead of being lysed.

Virulent mutants are unable to establish lysogeny and thus the plaques only contain

lysed cells clear plaques

The lambda repressor subunit is a polypeptide of 27kD with two distinct domains

connected by a connector of 40 subunits:

N-terminal domain from residues 1-92 provides the operator binding site but with a

lower affinity than the intact lambda repressor

C-terminal domain from residues 132-236 is responsible for dimerization and can

form oligomers

Binding to the operator requires the dimeric form so that two DNA-binding domains can

contact the operator simultaneously.

Induction of a lysogenic prophage into the lytic cycle is caused by cleavage of

repressor subunit in the connector region which reduces the affinity for the operator.

Induction can be caused by UV irradiation which leads to degradation of repressor.

Balance between lysogeny and lytic cycle depends on concentration of repressor

where intact repressor is present in a lysogenic cell at a concentration sufficient to

ensure that operators are occupied.

In lysogeny, monomers are in equilibrium with dimers which bind to DNA.

Induction causes cleavage of monomers and disturbs the equilibrium and thus

dimers will dissociate.

The lambda operator is a 17bp palindromic sequence with an axis of symmetry through the

central base pair. The sequence on each side of the central base pair is a half site. Each

individual N-terminal region contacts a half site.

The amino acid sequence of the recognition helix in the

helix-turn-helix motif makes contact with particular bases in

the operator sequence that it recognises.

Contacts between helix-2 and

helix-3 are maintained by

interactions between hydrophobic

amino acids

Helix-3 of each monomer lies in

the wide groove on the same face

of the DNA and helix-2 lies across

the groove.

Page 29: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 29

Each operator contains 3 repressor-binding sites and overlaps with the promoter at which

RNA polymerase binds.

Binding sites within each operator are separated by spacers of 3 to 7pm that are AT

rich.

The orientation of OL has been reversed from usual to facilitate comparison with OR. Site 1 lies closest to the start point for transcription in the promoter and sites 2 and 3 lie farther upstream. At each operator, site 1 has a greater affinity (tenfold) than the other sites and thus the

repressor always binds first to OL1 and OR1 first.

Repressor binding to one operator increases the affinity for binding a second

repressor dimer to the adjacent operator.

However when both sites 1 and 2 are occupied, this interaction does not extend

further to site 3 In lysogeny, both sites 1 and 2 are filled but not site 3.

When two lambda repressor dimers bind cooperatively, each of the subunits of one dimer

contacts a subunit in the other dimer through the C-terminal domain, forming a tetrameric

structure.

Cooperative binding allows the repressor to bind the OL2 and OR2 sites at lower

concentrations and this is important in a system which release of repression has

irreversible consequences.

In an operon coding for metabolic enzymes, failure to repress will merely

allow unnecessary synthesis of enzymes, but failure to repress lambda

prophage will lead to induction o phage and lysis of cell.

When two dimers are bound at OR1 and OR2, the DNA-binding region/N-terminus of the

dimer (helix 2) at OR2 contacts RNA polymerase and stabilizes its binding to PRM and

activates it.

Repressor binding at OL blocks transcription of gene N from PL while repressor

binding at OR blocks transcription of cro but also is required for transcription of cI

low levels of repressor can positively regulate its own synthesis as long as enough

repressor is available to fill OR2

Page 30: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 30

Repressor dimers bound at OL1 and OL2 can interact with dimers bound at OR1 and OR2 to

form octamers

This interaction stabilizes repressor binding and thus making it possible for

repressor to occupy operators at lower concentrations.

The DNA between OL and OR sites (gene cI) forms a large loop which is held together

by the repressor octamer.

At lower concentrations, lambda repressor form octamer and active PRM in a positive

autogenous regulation. Increase in concentration allows binding to OR3 and OL3 and

turn off transcription in a negative autogenous regulation.

When a lambda DNA enters a new host cell, RNA polymerase cannot transcribe cI because

there is no repressor present to aid it binding at PRM. The absence of repressor leads to the

availability of PR and PL.

Thus the first event after lambda DNA infects a bacterium is when genes N and cro

are transcribed and then pN allows transcription to be extended further.

The delayed early gene products cII and cIII are necessary for RNA polymerase to

initiate transcription at the promoter PRE.

The product of cII acts directly at the promoter while the product of cIII protects cII

from degradation.

Transcription from PRE leads to synthesis of repressor and blocks cro synthesis

(promotes lysogeny)

Direct effect is that cI mRNA is translated into repressor protein

Indirect effect is that transcription proceeds through the cro gene in the “wrong”

direction where 5’ part of the RNA corresponds to an antisense transcript of cro

and hybridizes to authentic cro mRNA which inhibits its translation.

The PRE has atypical sequences at -10 and -35 and RNA polymerase binds the PRE promoter

only in the presence of cII

The PRE promoter has a poor fit with the consensus at -10 and lacks a consensus

sequence at -35 and thus is dependent on positive regulator cII.

Page 31: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 31

Lysogeny requires several events:

Presence of cII and cIII causes repressor synthesis to be established and trigger

inhibition of late gene transcription.

cIII protects cII which allows PRE to be used for transcription extending

through cI.

This causes the lambda repressor protein to be synthesized in high amounts

and it immediately binds to OL and OR

Establishment of repressor turns off immediate and delayed early gene expression

Transcription from PL and PR is inhibited and repressor binding turns off the

expression of all phage genes.

Synthesis of cII and cIII halts and decays, and PRE cannot be used and

synthesis of repressor stops.

Repressor turns on the maintenance circuit for its own synthesis by via expression

from PRM by making contact with RNA polymerase sigma factor.

Repressor continues to be synthesized until at high levels, occupancy of OR3

causes the synthesis to be turned off.

Lambda DNA is integrated into the bacterial genome at the final stage in establishing

lysogeny.

Page 32: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 32

The lytic cascade requires cro protein which directly prevents repressor maintenance via

PRM as well as turning off delayed early gene expression.

Cro is responsible for preventing the synthesis of the lambda repressor protein cI.

Cro binds to the same operators as lambda repressor but with different affinities

where the affinity of cro for OR3 is greater than its affinity for OR2 or OR1

When cro binds to OR3, it prevents RNA polymerase from binding to PRM and this

prevents the maintenance circuit for lysogeny from coming into play.

When cro binds to other operators at OR/OL, it prevents RNA polymerase from

expressing immediate early genes (including cro itself) and any use of PRE is

prevented, indirectly blocking repressor establishment.

The delayed early stage when both cro and repressor are being expressed is common to

both the lysogeny and lytic cycle.

The critical event is whether cII causes sufficient synthesis of repressor to overcome

the action of cro. If cII causes sufficient synthesis of repressor, lysogeny will result

because repressor occupies the operators. Otherwise cro occupies the operators,

resulting in lytic cycle.

In the early stages of the infection, cro is given a head start over the lambda

repressor and so it would seem that the lytic pathway is favoured. However, stability

of the cII protein in the infected cell is a primary determinant of the outcome.

Page 33: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 33

Lecture 10 – DNA Replication and Transfer

An origin usually initiates bidirectional replication where a replicated region appears as a

bubble within non-replicated DNA.

Replicon: unit of genome in which DNA is replicated. Each contains an origin for

initiation of replication

Origin: sequence of DNA at which replication is initiated.

At the origin, two replication forks are created that move in opposite directions.

They usually meet halfway around the circle but there are ter sites that cause

termination if the replication forks go too far.

In E. coli, the origin of replication, oriC is 245bp in length.

It contains 11 palindromic

repeats that are methylated on adenine on both

strands by Dam methylase

Replication generates hemi-methylated DNA (only one strand is methylated) which

cannot initiate replication only fully methylated origins can initiate replication.

There is a 13-minute delay before the

repeats in origin are re-methylated

(other sites <1.5 minutes)

In delaying re-replication, SeqA binds to hemi-methylated DNA and prevent origin

from being remethylated.

Initiation at oriC requires the sequential assembly of a large protein complex on the

membrane that requires six proteins:

DnaA: ATP-binding protein and licensing factor (factor necessary for replication;

inactivated/destroyed after one round of replication)

DnaB: ATP-hydrolysis dependent 5’ to 3’ helicase which provides the “engine” of

initiation after the origin has been opened.

DnaC: chaperone to repress the helicase activity of DnaB until it is needed.

HU: general DNA-binding protein which stimulates replication. Has the capacity to

bend DNA and is involved in building the structure that leads to formation of open

complex.

Gyrase: type II topoisomerase which binds to double helix ahead of replication fork

and relieve the strain placed on the double helix as it unravels.

SSB (Single-strand binding protein): stabilizes the single-stranded DNA as it is

formed and modulates the helicase activity. About ~60/fork.

For initiation to occur, the following events must happen:

The oriC must be fully methylated

Protein synthesize is required to synthesize the origin recognition protein

Membrane/cell wall synthesis

Page 34: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 34

Sequence of initiation:

DnaA-ATP binds to short fully methylated repeated

sequences (13bp and 9bp repeats) and forms an

oligomeric complex that melts DNA at the A-T rich

region

Six DnaC monomers bind each hexamer of DnaB and

this pre-priming complex binds the origin.

DnaG (primase) is bound to the helicase complex which

releases DnaC, allowing DnaB helicase to become active

and creates the replication fork.

A primase synthesizes an RNA chain that provides the

priming end for DNA replication.

Priming is required to start DNA synthesis as all DNA polymerases cannot initiate synthesis

of a chain of DNA, but can only elongate a chain. Synthesis of the new strand can only start

from a pre-existing 3’–OH end known as a primer.

DNA polymerase adds nucleotides

to the 3’–OH end of the growing

chain such that the new chain

grows in the 5’3’ direction.

DNA polymerases control the

fidelity of replication where they

often have a 3’5’ exonuclease

activity that is used to excise incorectly paired bases.

Proofreading – a mechanism for correcting errors in DNA synthesis that

involves scrutiny of individual units after they have been added to the chain

Processivity – The tendency to remain in a single template rather than to

dissociate and re-associate.

Note: DNA polymerase I has 5’3’ exonuclease activty where the base is

hydrolyzed and expelled if incorrect.

Fidelity of replication is improved by proofreading by a factor of ~100 to ~1000.

Semi-discontinuous replication: the mode of replication in which one new strand is

synthesized continuously while the other is synthesized discontinuously.

For the leading strand (5’3’), DNA polymerase advances continuously, but for the

lagging strand it makes short fragments (Okazaki fragments, 1000 to 2000 bases)

that are subsequently joined together

All DNA polymerases require a 3’–OH priming end to initiate DNA synthesis

The priming end can be provided by an RNA primer, nick in DNA or a priming

protein.

The E. coli replicase DNA polymerase III Holoenzyme is a 900kD complex with a dimeric

structure where each monomeric unit consists of:

A catalytic core contains three subunits which include a catalytic subunit () and a

proofreading subunit () and a subunit which stimulates the exonuclease.

One catalytic core is associated with each template strand.

A dimerization clamp-loader complex which consists of:

Two copies of the dimerizing subunit () which links the two catalytic cores

together.

Page 35: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 35

A clamp loader which is a five-subunit

protein complex that is responsible for

loading the clamp onto DNA at the

replication fork by placing the processivity

subunits on DNA where they form a

circular β clamp around DNA.

A processivity clamp which is responsible for

holding catalytic cores onto their template strands.

Each clamp consists of a homodimers of β-

subunits, the β2 ring, which binds around

the DNA and ensures processivity.

The core on the leading strand is processive because its

clamp keeps it on the DNA.

The clamp associated with the core on the lagging

strand dissociates at the end of each Okazaki

fragment and reassembles for the next fragment

The helicase DnaB is responsible for interacting

with the primase DnaG to initiate each Okazaki fragment.

Each Okazaki fragment starts with a primer and stops before the next fragment where DNA

polymerase I (with 5’3’ exonuclease) removes the RNA primer and replaces it with DNA.

DNA ligase I makes the bond that connects the 3’ end of one Okazaki fragment to the 5’

beginning of the next fragment.

In Eukaryotic replication, separate DNA polymerases undertake initiation and elongation

where a replication fork has one complex of DNA polymerase /primase and two complexes

of DNA polymerase and/or .

DNA polymerase has the

ability to initiate a new strand

where it is used to initiate both

the leading and lagging strand.

DNA polymerase elongates the

leading strand and a second

DNA polymerase elongates

the lagging strand.

Conserved function of the replication

components extends to the clamp loader and

processivity clamp as well as other functions

of the replisome.

A replication fork stalls when it arrives at damaged

DNA. To avoid death, bacteria can undergo lesion

bypass or homologous recombination.

Lesion bypass: replication by an error-prone

DNA polymerase on a template that contains

a damaged base. E. coli DNA polymerase IV

and V can incorporate a non-complementary

base into the daughter strand. Requires

temporary replacement with DNA pol III.

Page 36: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 36

After the damage has been repaired, the replication fork must be restarted and this

may be accomplished by assembly of the primosome which reloads DnaB so that

helicase action can continue.

Semiconservative replication: replication accomplished by separation of the strands of a

parental duplex with each strand then acting as a template for synthesis of a complementary

strand. Double stranded DNA contains one parental and one daughter strand following

replication.

Conservative model Semi-conservative Dispersive model Gen 0 15-15 15-15 15-15

Gen 1 Two bands: 50:50

15-15, 14-14 One band:

15-14, 15-14 One band, each strand is

50% heavy and 50% light.

Gen 2 Two bands: 25:75

15-15,14-14,14-14,14-14 Two bands: 50:50

15-14,14-14,15-14,14,14 Two bands, each strand is 25% heavy and 75% light.

Gene transfer in prokaryotes can happen via:

Transformation (naked DNA) – either via CaCl2 + heat shock or electroporation

Conjugation (bacteria-mediated): process in which two bacteria come in contact

and transfer genetic material. The process and is mediated by the F plasmid

(Fertility factor).

A free F plasmid is a replicon that is maintained at the level of 1

plasmid/bacterial chromosome which

can be integrated into the bacterial

chromosome. The F factor is

transferred frequently.

F plasmid consists of tra genes which

encodes for transfer functions (pilus

synthesis and assembly, cell pairing etc.)

and are all located in an operon.

F+ cell and F- cell: results in 2 F+ cells, no change in genetic composition.

Page 37: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 37

When F integrates into a bacterial chromosome, it gives rise to different Hfr

strains through site-specific recombination.

When Hfr strain mates with a F- cell, they almost never acquire an F+

phenotype as only the first part of F is transferred:

Mating channel is fragile and easily broken by change in environment

Time needed for complete transfer > bacterial lifespan (100min)

Recipient bacteria may lack space for additional DNA.

Chromosome map can be determined via interrupted mating technique

which is used to map the order of bacterial genes based on their order of

transfer into recipient cell.

Genes nearest to oriT have the highest

frequency of being transferred

Genes transferred early are more

frequently represented in

recombinants

Complete E. coli genetic map is about

90mins (4600kb) and zero point is the

marker thr.

If oriT is pointing to the left, then the gene on its right will be the first

to enter the recipient cell and F factor is at the end of the genome.

An F’ is formed by improper excision of F from bacterial chromosome and

it can carry as much as 15% of E. coli genome and thus providing partial

diploidy when transferred into a recipient strain.

This homologous region can recombine with host chromosome.

Transduction (phage-mediated): Bacteriophage-mediated transfer of host DNA

from one bacterium to another and occurs as the result of reproductive cycle of

bacteriophage.

Lytic cycle: viral reproductive cycle that ends in lysis of bacteria virulent

phage.

Lysogeny: maintenance of viral genome (prophage) within the host cell

(integrated into bacteria chromosome) temperate phage.

Two types of transduction:

Generalized transduction: any part of bacterial genome can be

transferred and occurs during lytic cycle.

Randomly sized fragments are packed into phage and

homologous recombination may occur in recipient bacteria.

Specialized (restricted) transduction: transfer of only specific

portions of the bacterial genome; carried out only by temperate

phages that have integrated their DNA into the host chromosome at

a specific site in the chromosome.

Phage particles carry both phage DNA and flanking bacterial

DNA, but only bacterial DNA adjacent to the prophage

insertion site is packaged.

Occurs only when lysogeny is induced to go into lytic phase.

The integration and excision of phage involves site-specific

recombination between attP and attB.

Page 38: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 38

Specialized transducing phage that carry genes located on the

left side of the prophage ( dgal) are proficient for lysogeny

but deficient for lysis and thus they require a helper phage to

lyse a recipient cell. The “d” indicates that the phage is

defective for lytic growth.

Specialized transducing phage that carry genes located on the

right side of the prophage ( bio) are proficient for lysis but

deficient for lysogeny. These phages can infect a recipient cell

and generate a lysate but require a helper phage to form

lysogens in a recipient cell. They have all functions required

for lytic growth.

Page 39: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 39

Selecting for recombinants or transconjugants:

Prototrophs: wild type strain that has minimal requirement for nutrient

supplements

Auxotrophs: mutant strain that has lost its ability to synthesize a nutrient such as

amino acids or lactose

E.g. Bacteria initially growing in complete media, then subsequently grown in a)

minimal media, b) minimal media + Histidine and c) minimal media + Arginine

Those that survive in a) are wild-type prototrophic colonies, b) His-

auxotrophic colony and c) Arg- auxotrophic colony.

Recombinants are detected using selective and counter-selective techniques:

Counter-selection against parental strains using antibiotics such as

streptomycin/kanamycin

Selection of recombinants using antibiotics or ability to utilize a sugar

(lactose)

Best way to select recombinants: minimal medium + lactose + streptomycin

for a conjugation of donor strain with StrSLac+ and recipient strain StrRLac-

Lac- cannot utilize lactose and thus it cannot grow in the minimal media with

lactose as sole carbon source

StrS strain is sensitive to streptomycin antibiotics and cannot grow in

minimal media with streptomycin

Donor Strain (StrSLac+)

Recipient Strain (StrRLac-)

Recombinants (StrRLac+)

MM + lactose Grow X Grow MM + lactose + strep X X Grow

MM + strep X X X

Page 40: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 40

Lecture 11 –DNA Recombination

Two types of genetic recombination in bacteria:

General Recombination requires long (>50bp) sequence homology and RecA-

dependent

Site-specific recombination requires very short (<5bp) sequence homology and

has special site recognition. It is RecA-independent but requires specialized proteins.

General Recombination:

Genetic exchange takes place between 2 pieces of homologous DNA sequences and

it may be intra or inter-molecular events.

Recombination may result in insertion, gene amplification, deletions or inversions.

At the site of crossover, there is a heteroduplex

DNA formation (hybrid DNA from the different

parental duplex molecules) during genetic

recombination and new recombinant DNA

molecules are produced.

Single-strand invasion model: recombination is initiated by a nick in one strand.

RecA first binds cooperatively to the invading strand and invades the

homologous duplex.

Once a triplet nucleotide match is found, RecA hydrolyzes ATP and the

strands exchange.

Repair DNA polymerases and DNA ligase completes the repair process.

A Holliday junction is an intermediate structure in homologous

recombination where the two duplexes of DNA are connected by the genetic

material exchanged between two of the four strands, one from each duplex.

Double-strand break model: initiated by a double strand break (DSB) by an

endonuclease cleaving one of the partner’s DNA duplexes.

1. The DSB is enlarged to a gap by 5’3’ exonuclease action to create

protruding single-stranded 3’ tails.

2. Single-stranded DNA are recognized by RecA protein which initiates

homology search in other chromosome

3. ATP-dependent strand exchange occurs followed

by DNA synthesis and ligation

4. Branch migration (ability of DNA strand partially

paired with its complement in a duplex to extend

its pairing by displacing the resident strand with

which it is homologous) of Holliday junctions

5. Resolution by strand cutting via DNA ligase

Page 41: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 41

The resolution of a Holliday junction produces “splices” or “patches”

Splice recombinant DNA results from a

Holliday junction being resolved by

cutting the non-exchanged strands.

Both strands of DNA before the

exchange point come from one

chromosome; the DNA after the

exchange point comes from the

homologous chromosome.

Patch recombinant DNA results from a

Holliday junction being resolved by

cutting the exchange strands. The

duplex is largely unchanged, except for a

DNA sequence on one stand that came

from the homologous chromosome.

Other proteins which participate in general

homologous recombination include:

RuvA: 22kD protein which binds to

RuvB and Holliday junctions

RuvB: 37kD helicase that catalyses

branch migration

RuvC: 19kD nuclease which resolves

Holliday structures (resolvase)

The above 3 proteins form the Ruv complex which acts on recombinant junctions.

DNA ligase

RecBCD is a helicase-nuclease complex that initiates the repair of double-strand breaks.

There are about ~1000 chi (crossover hotspot instigator) sites (5’ – GCTGGTGG – 3’)

present in the E. coli chromosome

Nuclease activity on the stand with the 3’ end is suppressed upon reaching a chi

sequence while the other strand continues to be degraded, generating a 3’ terminal

single-stranded end.

Single-stranded DNA generated at chi sites are hotspots for general recombination.

General recombination and DNA repair mechanisms may result in gene conversion where

only small sections of DNA or part of a gene undergo gene conversion.

Gene conversion is

non-reciprocal

exchange.

Mismatched DNA in

a heteroduplex are

recognized are

removed by the DNA

repair enzymes and

replaced with a

copy of the

complementary

strand.

Page 42: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 42

Three types of Site-specific recombination:

Transposons – Three major classes:

1. DNA-only transposons – requires transposase, moves as DNA either by cut-

and-paste or replicative pathways. Have short inverted repeats at each end.

Predominately in bacteria and responsible for spread of antibiotics

resistance in bacterial strains.

Excised from one spot on a genome and inserted into another

Transposons would encode for a transposase which carries out the

DNA breakage and joining reactions needed for the element to move.

DNA-only transposons can be recognized in chromosomes by

“inverted repeat DNA sequences” present at their ends.

Cut-and-paste movement beings when transposase brings the two

inverted DNA sequences together, forming a DNA loop.

o Transposase function as a dimer with each monomer

recognizing the same specific DNA sequence at the end of the

transposon.

Insertion occurs at random sites through the creation of staggered

breaks in the target chromosome, catalysed by transposase.

Subsequently, staggered breaks are repaired by DNA polymerase and

ligase.

Insertion site is marked by a short direct repeat of the target DNA

sequence (clues in identifying transposon in genome sequence)

Page 43: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 43

2. Retroviral-like retrotransposons – requires reverse transcriptase and

integrase (transposase), moves via an RNA intermediate produced by a

promoter in the LTR. Have directly repeated long terminal repeats (LTRs)

at each end

Once the reverse transcriptase has produced a double stranded DNA,

specific sequence near its two ends can be recognizes by a virus

encoded transposase (integrase) which then inserts the viral DNA

into the chromosome using a similar cut-and-paste DNA only

transposons.

3. Non-retroviral retrotransposons – requires

reverse transcriptase and endonuclease, move

via an RNA intermediate that is often produced

from a neighbouring promoter (endonuclease-

reverse transcriptase complex).

Have poly A at 3’ end of RNA transcript

and the 5’ end is often truncated.

Occurs as repetitive DNA sequences (L1

element or LINE element)

Transposition beings when an

endonuclease attached to the L1 reverse

transcriptase and the L1 RNA nick the

target DNA at insertion point. Cleavage

releases 3’–OH DNA which acts as

primer for reverse transcription.

Single-strand DNA copy of the element is

generated and further processing results

in generation of new double-strand DNA

which is inserted at site of initial nick.

Page 44: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 44

Phage integration and excision – Specialized transduction where circular phage

lambda DNA is converted to an integrated prophage by a reciprocal recombination

between attP and attB.

Cre-Loxp system: Cre is a bacteriophage P1 integrase which catalyses site-specific

recombination between loxP sites (34 bp short direct repeats)

This recombination also works in mammalian cells in vitro and in vivo

Page 45: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 45

Lecture 12 –DNA Repair

DNA repair is a major defence against environmental damage to cells which minimizes cell

killing, mutations, replication errors, persistence of DNA damage and genomic instability.

Abnormalities in DNA repair have been implicated in cancer and aging.

DNA damage can be:

Spontaneous: Depurination, deamination

Mutagen-induced: Pyrimidine dimers, alkylation, substitution, deletions/insertions,

frameshift mutations, double-strand breaks

Point mutations:

Transitions – a purine (A or G)/pyrimidine (C or T) is replaced by other

purine/pyrimidine.

A replaced with G or the reverse

C replaced with T or the reverse

Transversions – a purine (A or G) is replaced by a pyrimidine or vice versa

A replaced by C or T

G replaced by C or T

C replaced by A or G

T replaced by A or G

Hydrolytic attack can cause depurination or deamination. If left uncorrected, such changes

could lead to deletion or substitution of base pairs during DNA replication.

Deamination of bases in DNA yields unnatural nucleotides which can be directly

recognized and removed by specific DNA glycosylases.

Deamination of C produces U which can be repaired by uracil DNA

glycosylase.

Nitrous acid (HNO2) oxidatively deaminates primary amines, producing

transition mutations: Adenine Hypoxanthine

When methylated C is accidentally converted to T by deamination, DNA

mispairing can occur. G:C base pair G:T base pair.

Page 46: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 46

Formation of a dimer between 2 pyrimidine bases is possible when cells are exposed to UV

irradiation. This occurs between two adjacent thymine or cytosine bases.

Alkylation of a base may change the normal base pairing, leading to mutation.

Nitrogen mustard can cross-link with DNA at N7 of guanine, resulting in

chromosome breakage.

DNA exposed to EMS and MNNG yields O6-ethylguanine and O6-methylguanine

residues respectively which can base pair with both C or T. G:C base pair T:A base

pair.

Insertion/Deletion mutations are generated by intercalating agents

Intercalating agents increases the distance between 2 consecutive base pairs.

Replication of such DNA generates deletion or insertion of one or more nucleotides

in the newly synthesized DNA which results in a frameshift mutation.

E.g. Ethidium Bromide which binds to DNA, used in gel electrophoresis.

Page 47: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 47

Ames test is commonly used to assess mutagenicity of compounds.

Types of repair pathways:

Direct repair: direct reversal of the

damage. Widespread in all except

placental mammals.

Excision repair: initiated by a

recognition enzyme that sees an

actual damaged base or a change in

the spatial patch of DNA.

Base excision repair: remove

the damage base and replace it

in DNA e.g. DNA uracil

glycosylase

Nucleotide excision repair:

remove a sequence that

includes the damaged base(s)

and a new stretch of DNA is

synthesized to replace.

Mismatch repair: scrutinize DNA for

apposed bases that do not pair

properly. Arises during DNA

replication and are corrected by

distinguishing between the new and

old strand.

Recombination-repair: a mode of filling a gap in one strand of duplex DNA by

retrieving a homologous single strand from another duplex.

Nonhomologous end joining: repairs DSB when no homologous strands are

available.

In bacteria: the following types of DNA repair systems are present:

Repair of DNA synthesis errors

1. Proofreading by DNA polymerase (3’5’) exonuclease – reduces errors

introduced during DNA synthesis by 1000-fold.

Page 48: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 48

2. Mismatch repair by mutSLH system. Depends on

the methylation of selected A residues in GATC

to distinguish between newly synthesized DNA

and template DNA.

The mutH endonuclease makes a nick on

5’ side of the unmethylated GATC, UvrD

(helicase) and exonuclease removes the

DNA strand.

The unmethylated DNA strand is

corrected by DNA polymerase III.

Strand-directed mismatch repair reduces

the error by a further 100-fold.

Repair of DNA modifications

3. Direct reversal of damage – photo-reactivation

repair

Removes pyrimidine dimers in a light-

dependent reaction

Occurs in bacteria but not in placental

mammals.

Involves a photo-reactivation enzyme (PRE)

photolyase.

Non-mutagenic repair system.

4. Excision repair by DNA glycosylase and Apurinic/apyrimidinic (AP)

endonuclease

Base excision repair (BER) – only removes the damaged base. DNA

glycosylase cleaves the glycosidic bond leaving the

apurinic/apyrimidinic site.

Other enzymes such as AP endonuclease, DNA polymerase I and

DNA ligase are involved.

Nucleotide excision repair (NER) – corrects pyrimidine dimers and other

DNA lesions in which the bases are displaced.

In E. coli, NER is an ATP-

dependent process involving

UvrA, UvrB, UvrC and UvrD

proteins. The Uvr system

operates in states in which

UvrAB recognizes damage,

UvrBC nicks the DNA and UvrD

unwinds the marked region.

Individuals with Xeroderma

Pigmentosum (XP) and

Cockayne syndrome (CS) are

unable to repair UV-induced

DNA lesions.

Repair of replication fork barriers

5. Translesion synthesis – when lesion is encountered during replication, DNA

Pol III is replaced by error-prone Translesion DNA polymerase, Pol IV or V.

Page 49: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 49

Translesion DNA polymerase extends DNA synthesis beyond thymine

dimer independent of base pairing and has no proofreading exonuclease

activity.

Translesion DNA synthesis is error prone and often has errors in its

sequence. This is invoked as a last resort as part of the SOS response

Repair of breaks in DNA

6. Repair of DSB by homologous recombination (HR) and non-homologous end

joining (NHEJ)

A DSB is generated when the replication fork encounters a single-

strand nick in the template DNA. DSBs can also be induced by ionizing

radiation, replication errors, oxidising agents and certain cellular

metabolites.

DSB repair by NHEJ is common in mammalian somatic cells and Ku is a

key protein in NHEJ.

NHEJ pathway can ligate blunt ends of duplex DNA and thus suffer

deletion.

Page 50: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 50

Differentiated cells contain

all the genetic instructions

necessary to direct the

formation of a complete

organism. (A) The nucleus of

a skin cell from adult frog

transplanted into an

enucleated egg can give rise

to an entire tadpole. (B) In

many plants, differentiated

cells retain the ability to de-

differentiate such that a

single cell can form a clone of

progeny cells that later give

rise to an entire plant. (C)

Calves produced from the

differentiated cell donor are

all clones of the donor and

thus are genetically identical.

Lecture 13 –Eukaryotic Gene Expression (Overview)

Gene expression in multi-cellular eukaryotes

Genome constancy differential gene expression different proteins different

cell types

There is physical evidence for genome constancy:

Number of chromosomes is constant among different types of cells

All human cells contains 22 pairs of autosomal chromosomes and one pair of

sex chromosomes

Amount of nuclear DNA is constant among different cells

No gene amplification and rearrangement in majority of cell types (exceptions –

immune cells)

Totipotency of nuclei of differentiated cells – differentiated nuclei retain a complete set of

genes for the whole organism.

John Gurdon’s work in 1958 – nuclear transplantation and induced stem cells.

Differential Expression:

Not all genes are expressed in any single type of cells and different sets of genes are

expressed in different types of cells

Expression of the same gene may be at different levels in different types of cells or

under different circumstances

Due to genes being transcribed with different efficiencies resulting in

different amount of proteins produced.

Cell differentiation:

Cells become different through the synthesis of different sets of mRNAs and proteins

which results in different morphology and physiological function

Each type of cells synthesize a few characteristic proteins at high abundance

Globin in red blood cells

Cell differentiation is usually stable and irreversible.

Page 51: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 51

Differential mRNA expression by DNA microarray analyses (heat map)

Each column

represents one sample. In

the image, multiple liver

samples are shown.

Each row

represents one gene where

in the image, thousands of

genes are shown.

Red and

green colours represent

levels of gene expression

with red for higher levels

and green for lower levels.

Housekeeping genes are expressed in all types of cells for basic cellular functions

E.g. Structural proteins (β-actin, histones and ribosomal proteins etc.) and metabolic

enzymes (glycogen synthase kinase etc.)

Tissue-specific genes gives the cell its specific phenotype

E.g. globin, crystalin, insulin

Except for housekeeping genes, most other genes are only expressed in certain cells.

Red = common

(Housekeeping genes)

Blue = specific

(Tissue-specific genes)

Even though

proteins are translated,

many of them require post-

translational modifications

for their proper functions.

Thus proteins have

different isoforms.

Figure shows the differences

in RNA levels for two human genes in

seven different tissues.

RNA sequencing was used to

obtain the data where RNA was

collected from human cell lines

grown in culture derived from the

indicated tissues. The sequence reads

were mapped across the human

genome by matching RNA sequences

to the DNA sequence of the genome.

Number of transcripts can be

counted for quantitative analyses of

RNA expression.

Page 52: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 52

Examples of specialized proteins in differentiated cells for specialized functions:

Cell type Differentiated cell product

Specialized function

Keratinocyte (skin cell) Keratin Protection against abrasion, desiccation

Erythrocyte (red blood cell) Hemoglobin Transport of oxygen Lens cell Crystallins Transmission of light

B lymphocyte Immunoglobulins Antibody synthesis T lymphocyte Cell surface antigens Destruction of foreign cells;

regulation of immune response Melanocyte Melanin Pigment production

Pancreatic islet cells Insulin Regulation of carbohydrate metabolism

Osteoblast (bone-forming cell)

Bone matrix Skeletal support

Myocyte (muscle cell) Muscle actin and myosin Contraction Hepatocyte (liver cell) Serum albumin;

numerous enzymes Production of serum proteins and

numerous enzymatic functions Neurons Neurotransmitters

(acetylcholine, epinephrine etc.)

Transmission of electrical impulses

Comparison between gene expression in eukaryotes and prokaryotes:

Similarity of central dogma - DNARNAProtein

Differences for eukaryotes:

Occurs in nucleus

Exons/introns present DNA

5’ capping, RNA splicing and polyadenylation for mRNA

Exportation out of nucleus

Post-translational modification of proteins

Multiple levels of gene expression regulation in eukaryotic cells:

Pre-transcriptional control

Chromatin structure (heterochromatin and euchromatin)

DNA methylation (widely used and is the major form of epigenetic regulation)

Methylation affects binding of transcription factors to promoter

DNA amplification (for a small number of genes under special conditions and

cancer cells)

E.g. Xenopus ribosomal RNA genes ~1500X during oocyte growth

Drosophila polytene chromosomes in salivary gland ~1000 copies.

DNA rearrangement (only found from specific sets of genes in specialized

immune cells - immunoglobins)

In an immunoglobin light chain gene, a randomly chosen V (~35)

gene segment is moved to lie precisely next to one of the J (~5) gene

segments Results in a total of 35 5 =175 potential light chains.

For Ig heavy chain, V (40), D (23) and J (6) which results in 5520

variable heavy chains 1.5 million different combinations.

Transcriptional control

Due to presence of DNA regulatory elements e.g. promoter,

enhancer/silencer, locus control region, insulator binding sites etc.

Page 53: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 53

Gene clusters are organised within a loop consisting of enhancer, promoter,

structural gene and terminator.

Post-transcriptional control

Translational control

Post-translational control

The kind of proteins and the concentration can be controlled at multiple levels

In the end, the structure and function of a cell are determined mainly by the functional

proteins

Lecture 14 –Promoters and cis-elements

In comparison with prokaryotes, eukaryotes have more genes and only a portion of them

are transcribed (euchromatin). In addition, the chromosomal DNA is highly packed

(heterochromatin) and those regions are usually inactive.

In the transcription initiation:

Eukaryotic RNA polymerases (RNAP) cannot initiate transcription on its own and

would require a large set of proteins known as general transcription factors (GTFs).

GTFs help to position the RNAP and interact with gene-specific TFs. In comparison,

prokaryotes only require a sigma factor.

Eukaryotic RNA polymerases must cope with DNA packaging in the chromatin.

Eukaryotic gene promoter and other regulatory sequences can work in a long

distance (>50kb)

Eukaryotes have three RNA polymerase systems

RNA polymerase I – transcribes 5.8S, 18S and 28S rRNA genes

RNA polymerase II – transcribes all protein-coding genes, snoRNA, miRNA, lncRNA

and most snRNA genes

RNA polymerase III – transcribes tRNA, 5S rRNA, snRNA and genes for other small

RNAs

The S values refer to the rate of sedimentation in an ultracentrifuge where the larger

the S value, the larger the rRNA.

Different types of RNA:

rRNA – ribosomal RNA

snoRNA – small nucleolar RNAs for processing rRNA

miRNA – microRNA for degradation of mRNAs

snRNA – small nuclear RNAs for mRNA processing

lncRNA – long non-coding RNA > 200nt which have various functions in

transcription and epigenetics.

Page 54: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 54

The transcription initiation complex by RNA polymerase II consists of:

Regulatory regions (promoter, cis-elements),

Pre-initiation complex (RNA polymerases and GTFs)

Specific transcription factors (for certain genes)

Co-activator and chromatin remodelling proteins

Promoter: binding site of a gene for basal transcriptional machinery for initiation of

transcription.

Cis-elements: binding sites of DNA for specific transcription factors or other regulatory

proteins which affect the rate of transcription.

4 transcription initiation sites for RNA polymerase II:

BRE – B recognition element,

in 22% of human genes and

is the binding site of TFIIB

TATA – present in large

portion of genes (~24% in

human), allows for correct

position of polymerase to

start transcription about 30

bases downstream from

TATA box.

INR – Initiator sequence, present in 46% of human genes and is the starting point

for transcription at nucleotide A. Initiations alone in some genes without the TATA

box are sufficient to initiate gene transcription.

DPE – downstream promoter element, in small number of genes (~12%), functions

to allow cooperative binding to TFFID.

Note that not all these elements are present in the same gene promoter and some

promoters may contain more than one such element.

Transcription rate can be altered by binding of proteins to specific sequences

Enhancers are binding sites for transcriptional activators that increase the rate of

transcription

Silencers are binding sites for transcriptional repressors that decrease the rate of

transcription and in some cases, prevent a region from being transcribed.

Both enhancers/silencers can be located near or far away (>50kb) from the

transcription unit (up or downstream) or in introns.

Each enhancer/silencer generally provides binding sites for several protein factors

Function of enhancer/silencer is generally orientation independent – binding

consensus sequence is frequently palindromic or symmetrical.

There are several promoter types and can be separated into major and minor promoters

Major promoters are type I (adult, tissue specific - TATA), type II (ubiquitous, broad

expression – no TATA) and type III (developmentally regulated, differentiation).

Type I has sharp transcription start site (TSS) while type II has broad TSS.

Type I has disordered nucleosomes while type II’s nucleosomes are ordered.

Type I has no CpG islands while type II has.

For type III, the TSS is broad but sharper than type II. It has large CpG islands

extending into the body of gene.

Minor promoter is TCT (pyrimidine) promoter for ribosomal protein & TF genes.

Page 55: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 55

Primer extension assay can be used to determine if TSS

is broad or sharp

Isolate RNA, hybridize complementary

oligonucleotide primer, extend to end of mRNA

with RTase and denature cDNA-mRNA hybrid.

Perform sequencing.

E.g. if mRNA transcript with U as first

nucleotide, then extended product should

contain an A for the first nucleotide.

Within the high level expression of genes in a loop, the

following are present:

Promoter – for expression level, tissue specificity,

temporal expression and inducibility. Promoter affects

the rate of initiation and rate of chain elongation.

Enhancer/silencer

Locus control region (LCR) – present in some gene

clusters and consists of multiple DNase hypersensitive

sites, LCRs are required for correct expression of whole

gene cluster.

LCR control the transcription of targeted gene in the locus by direct

interactions, forming looped structures. This is done by recruiting

chromatin-modifying, co-activators and transcription complexes.

The deletion of LCR causes condensation to heterochromatin.

Insulator binding sites – prevent enhancer effect to neighbour genes and provides

barrier against the spread of heterochromatin

Have specialized chromatin structures containing hypersensitive sites

(“naked regions that are easily accessible for DNase digestion, indicating

accessibility by other protein factors such as for gene transcription).

In transgenic studies, two insulators can protect the region between them

for faithful transgenic expression

Different insulators are bound by different factors and thus have different

mechanisms as barriers

Matrix attachment region (MAR) – defined as the DNA region attaching to nuclear

matrix which can be experimentally isolated.

MARs are A-T rich but do not have consensus sequence. They also contain

gene regulatory sequences and thus it is postulated that they may be

important in regulation of gene transcription within the chromosome loop.

Page 56: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 56

In transgenic studies (transfer of genes), MAR can be used to ensure

transgene expression to counteract “chromosome effect”.

Chromosomal effect refers that expression of a transgene may be

affected by the local chromosome structure or cis-elements around

the transgene integration site.

In identifying MARs, prepare nuclei and extract histones. After that, cleave

the DNA with restriction nucleases where MARs would be stuck on the

matrix, the DNA can then be extracted and analysed (in vivo). Alternatively

degrade all with DNase and add DNA where point isolated fragments can be

tested for their ability to bind to the matrix (in vitro).

Process of transcriptional initiation:

(A) RNA polymerase requires several transcription

factors. The promoter contains a DNA sequence called the

TATA box which is located 25 nucleotides away from the

initiation site.

(B) Through its subunit TBP (TATA-binding protein),

TFIID recognizes and binds the TATA box, which then

enables the adjacent binding of TFIIB

(C) The binding of TFIID produces a distortion in DNA

which helps to attract the other transcription factors

(D) Rest of the general transcription factors and RNA

polymerase assembles at the promoter

(E) TFIIH then uses energy from ATP hydrolysis to pry

open the DNA double helix at the transcription start point,

locally exposing the template strand. TFIIH also

phosphorylates RNA polymerase II at the C-terminal

domain (CTD), changing its conformation so that the

polymerase is released from the general factors and can

begin the elongation phase of transcription.

TFIID has a TBP and 11TAFs (TBP-associated

factors). TBP binding causes significant bending and

opening of DNA that serves as an important signal for other

binding proteins. TAF recognises promoter and initiator

elements and interacts with gene-specific regulatory

proteins.

TFIIB (1 subunit, 33kDa) binds BRE in promoter and

enables interaction between TFIID and RNAP II-TFFIF. It

aids in accurately positioning RNAP at the start site of

transcription.

TFIIF (2 subunits, Rap30 & Rap74) functions similar to sigma factor in prokaryotes,

guides specific binding of RNAPII to the complex assembly at the promoter. It may also

be involved in the elongation of nascent RNA.

TFIIE (2 subunits) functions to control TFIIH, enhances promoter melting and

stimulates transcription

TFIIH (9 subunits) is a release factor with a ATP-dependent helicase to melt promoter

and kinase to activate RNAPII by phosphorylating the CTD.

Page 57: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 57

RNA polymerase II has 12 subunits:

Three core units: RPB1, RPB2 and RPB3 in the ratio of 1:1:2. Homologous to the

prokaryotic b’, b and a subunit respectively. RPB1 (~200kDa) binds to DNA and has

CTD = (YSPTSPS)n with n=26 in yeast and n=52 in mouse; RPB2 (~100kDa) binds

nucleotides and RPB3 is ~50kDa.

CTD consists of multiple repeats which can be phosphorylated, is important

in transcription initiation, elongation and RNA processing.

Common subunits: RPB5, 6 and 8 are found in all three RNA polymerases.

Nonessential subunits: RPB4 and 9 – deletion mutants of these two function well at

normal temperature but fail to grow at either higher or lower temperature.

Other subunits: RPB7 and RPB11 – RPB7 is responsible for correct initiation of

transcription.

RNA polymerase I convert rDNA to rRNA which are synthesized in the nucleolus.

Many molecules of RNA

pol I simultaneously transcribing

each of the two adjacent genes

Nascent transcripts are

seen as fine threads.

These rRNAs contribute

to the formation of ribosomes.

rRNAs constitute ~80% of total RNA and it is estimated 10 million rRNA is

synthesized in each cell generation.

rRNA genes have a bipartite promoter consisting a core promoter and an upstream

promoter element and requires two factors: Upstream binding factor 1 (UBF) and

Selectivity factor 1 (SL1)

Page 58: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 58

UBF: required for high frequency initiation, maintaining open chromatin structure

by prevent histone H1 binding and assembly into inactive chromatin, stimulates

promoter release of RNAPI and stimulates SL1.

SL1: has 4 subunits including TBP and a TFIIB homolog – primarily used for

recruitment of RNAPI. The

TBP is associated with RNAPI

but not for DNA binding.

Transcription units for

RNA polymerase I (14

subunits, 590kDa) have a core

promoter separated by ~70bp

from the upstream promoter

element. UBF binding to the

upstream promoter element

(UPE) increases the ability of

core-binding factor to bind to

the core promoter. The core-

binding factor (SL1) positions

RNA polymerase I at the start

point, ensuring proper

localization at the start point.

There are 4 types of eukaryotic rRNA, each present in one copy per ribosome.

Three of the four rRNAs (18S, 5.8S and 28S) are made by chemically modifying

and cleaving a single large precursor rRNA and the fourth (5S) is synthesized

from a separate cluster of genes by a RNA polymerase III and does not require

chemical modification.

Both cleavage and chemical modifications of rRNA precursors require small

nucleolar RNAs (snoRNAs) as guide RNAs.

Many snoRNAs are encoded in the introns of other genes, especially those

encoding ribosomal proteins. They are synthesized by RNA polymerase II and

processed from excised intron sequences.

RNA polymerase III uses downstream and upstream promoters

Internal promoters have short consensus sequences (box A/B or A/C) located

within the transcription unit (downstream of start site) and cause initiation to

occur at a fixed distance upstream – deletion of 5’ sequence upstream of or

including the start point has no effect.

Upstream promoters contain three short consensus sequences (Oct, PSE, TATA)

upstream of the start point that are bound by TFs.

TFIIIA and TFIIIC bind to the consensus sequences and enable TFIIIB to bind at

the start point

TFIIIB has TBP as one subunit and enables RNA polymerase III to bind.

Type 1 (box A/C) is for 5S rRNA, type 2 (box A/B) is for tRNAs and Type 3 (Oct,

PSE, TATA) is for snRNA.

For type 1 and type 2, the main difference is the requirement of TFIIIA where

type 1 requires TFIIIA to bind to boxA while in type 2, 2 molecules of TFIIIC

binds to both boxA and boxB

Page 59: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 59

Common features of transcription initiation

by the three RNA polymerases:

GTFs bind at the promoter before

RNAP itself can bind and GTFs form

pre-initiation complex to direct the

binding of RNAP.

SL1 binds to UBF1 which is

bound to promoter

sequence before recruiting

RNAP1

TFIID & TAFs recognize a

promoter for RNAPII

TFIIIB binds adjacent to

TFIIIC to localize RNAPIII

Positioning of all three types of

polymerases requires TBP – which

is associated with other factors

(TAFs); TBP is the universal

positioning factor for all types of

promoters & their polymerases.

All three RNA polymerases are

large proteins (~500kDa) with ~12

subunits; three subunits are

common.

Lecture 15 –Transcription Factors

Basic Features of TFs

Bind to specific DNA sequence through DNA-binding domain (BD)

Interact directly or indirectly with the basal transcriptional machinery through

protein-protein interaction via activation domain (AD)

Often contain other functional domains.

Yeast two-hybrid system can be used to identify protein-protein interaction in

vivo in yeast and then to clone the gene encoding the interacting protein.

Functional yeast Gal4 TF has a BD and AD which activates transcription of

LacZ which gives a blue colony

A hybrid (BD + protein A or AD +

protein B) alone does not lead to

transcription.

If proteins A and B bind each

other to bring AD and BD

together, transcription is

activated as the AD is brought

into position to interact with the

GTFs at TATA

Page 60: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 60

Classification and structures of TFs

There are 10 superclasses of TFs and they are classified based on their conserved

DNA binding domains.

The most common ones are the zinc-coordinating domain (zinc finger), helix-turn-

helix domain (homeodomain) and the basic domain.

The number of TFs expressed in each tissue varies but the ratios of TF to total

expressed genes in all tissues are about the same (~6%)

Each TF controls multiple genes and TFs are differentially expressed in different

tissues different tissues have different patterns of TF expression.

During development, differential expression provides for hierarchical gene

regulation.

Prominent DNA binding domains of TFs

The molecular recognition between DNA & protein occurs mostly at major grooves

as they have wider space and contains more molecular features.

Protein-DNA interactions can be by H-bonds, Ionic bonds or Hydrophobic

interactions

Zinc Finger – typical finger (C2H2) is ~23aa with 2 cysteines on the β sheet and 2

histidines on the helix to chelate a zinc atom although variations are present

Zinc finger domain is formed by the interaction of the Zn atom with an

helix and an antiparallel β sheet

Each finger recognizes three GC-rich nucleotides.

Multiple zinc fingers are present in each protein

Amino acid residues -1, 2, 3 and 6 on the helix are critical for recognition of

nucleotides

Zinc fingers can be artificially designed to recognize targeted sequences –

zinc finger nucleases (ZFN) for genome editing.

Homeodomain Proteins – homeodomain is a conserved 60 amino acid domain

found in many TFs and is particularly important in development.

Homeodomain folded into 3

helices where helices 2 and 3

are similar to the HTH motif

Bases in both major and minor

groove are contacted

N-terminal arm lies in

minor groove, helices 1

and 2 lie above the DNA

while helix 3 lies in the

major groove.

Recognition has a ATTA(TAAT) core and the surrounding bases determine

specificity

Residue 50 plays an important role to determine target specificity

where K50 GGATTA while Q50 CCATTA.

Some homeodomain proteins contain two DNA binding domains

POU domain (HTH motif) cooperates with the homeodomain to

increase binding specificity and affinity

Paired domain binds to target DNA independent of homeodomain.

Page 61: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 61

Basic Helix-loop-helix (bHLH) proteins:

basic domain (stretch of basic amino

acids) for DNA binding

Contains two helices of unequal

length with a long loop allowing

flexibility to the helices which fold

over each other.

Formation of dimer through the

HLH domain

E.g. Myogenic bHLH proteins –

MyoD is specifically expressed in muscle cells while E2A is generally

expressed in many cell types. Id (inhibitor of differentiation) has no basic

region.

Formation of a dimer with MyoD inhibits the formation of MyoD-E2A

dimer, in the presence of Id, the E box (CANNTG) is not occupied.

In a proliferating myoblast which expresses MyoD, E2A and Id, the

MyoD binding site in the promoter of muscle creatine kinase (MCK)

is not occupied. When induced to differentiate into muscle, Id

concentration decreases and MyoD-E2A forms and binds to the MCK

promoter which causes the MCK gene to be transcribed.

bHLH proteins are present in all eukaryotes from yeast to human and is

involved in cell differentiation of various other types including heart,

pancreas and skin.

Leucine zipper proteins (bZIP): leucine residues

are present in every seven amino acids and thus

located on one side of the helix.

Leucine zipper is an amphipathic helix

where one face contains side chains that

are hydrophilic and the other face contains

side chains that are hydrophobic (leucine)

This is a motif that has a dimerization

domain (leucine zipper) and a DNA binding

domain (basic region) and the protein only functions when the dimer is

formed.

E.g. Myc and Max forms heterodimers and bind to E box (CACGTG) of target

genes. Myc is an important transcription factor, regulating the transcription

of ~15% of cellular genes including many growth factor genes.

Myc has multiple functions in cell cycle progression, apoptosis and

stem cell renewal.

Myc is one of the four factors to induce pluripotent stem cells by

combination with other factors (OSKM factors or Yamanaka factors:

Oct4, Sox2, Klf4 and Myc)

Overexpression of Myc frequently cause cancer

Transcriptional activation domains (AD)

ADs either interact directly with GTFs or with cofactors (in protein-protein

interactions).

Page 62: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 62

There are four kinds of protein domains that are commonly observed to be involved

in transcription activation:

Acidic domains – acidic amino acid side chains (Glu/Asp) e.g. Gal4, VP16

E.g. CREB (cAMP-response element binding protein) has an AD which

exists as unstructured

random coils with strong

negative charge (Asp rich).

In the presence of cAMP,

Ser-133 is phosphorylated

and CREB AD folds into

two amphipathic -helices

and interacts with co-

activator CBP, resulting in

the transcription of genes

whose control regions contain a CREB-binding site.

Glutamine-rich domains – about 25% Gln in sequence e.g. SP1

Proline-rich domains e.g. c-Jun, Ap2 and Oct2

Isoleucine-rich domains e.g. NTF-1

Nine-amino-acid transactivation domain (9aaTAD) – loose consensus in a large

superfamily of eukaryotic TFs and has been demonstrated to be essential for

transcription activation.

Multiple domains of TFs: nuclear receptors (steroid hormone receptors)

These are a group of zinc finger proteins that bind steroids in the cytoplasm and as a

result they move into the nucleus where they bind the DNA and dimerize to activate

transcription.

Ligand binding causes release of

inhibitory proteins while causing the

receptor to bind co-activator proteins

that stimulate transcription.

Nuclear receptor binds to HRE

(hormone responsive element) to

activate transcription by enhancing

formation of transcription initiation

complex

Assembly of multiprotein complex on

HRE enhances transcription by

interaction with GTFs, TAFs (TBP-

associated factors) and TIF (Transcription intermediary factor)

Each receptor has two fingers for DNA binding

and each finger contains 4 cysteine residues –

they can form dimers and bind to short

palindromic DNA sequences.

For the glucocorticoid receptor, the binding

site must contain a 3bp spacer for correct

positioning of the 2 zinc fingers to specifically

activate transcription.

Page 63: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 63

In summary, nuclear receptors have a DNA binding domain, activation domain

and a ligand binding domain.

Activity of nuclear receptor depends on expression of receptor in cell, availability

of ligand which acts as a “switch”, interaction with dimerization partner,

presence and accessibility of relevant responsive elements on the gene,

presence of relevant co-factors such as co-activators.

Signalling transduction: from extracellular signal to nuclear transcription

E.g. TGFβ (transforming growth factors) signalling.

Extracellular ligand TGFβ binding

resulted tetramerization of two

types of receptors to phosphorylate

intracellular domains

Activation of R-Smad (receptor

regulated Smad)

Formation of Smad trimers with the

common Smad4

Translocation of Smad complex to

the nucleus and binding to

responsive element to activate

target genes.

E.g. cAMP signalling

Binding of extracellular signal

molecule to GPCR activates

adenylyl cyclase via stimulatory G

protein and increases cAMP

concentration in cytosol

Rise in cAMP concentration

activates PKA (protein kinase A)

and the released catalytic subunits

of PKA can then enter the nucleus

where they phosphorylate CREB

(Ser-133).

Once phosphorylated, CREB

recruits the co-activator CBP

(CREB-binding protein) which

stimulates gene transcription.

cAMP is involved in a variety of

cellular activities where different

hormones are used for different cell

types but the intracellular signals

are the similar.

E.g. Adrenaline in muscle

Glycogen breakdown,

Vasopressin in kidney

water resorption.

E.g. Canonical Wnt signalling – in development and cancers

Page 64: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 64

Wnt binding to

Frizzled receptor

(GPCR)

Recruitment of two

co-receptors,

dishevelled and LRP

Dissociation of the

inhibitory complex

Release of

unphosphorylated β-

catenin to

translocate to the

nucleus

Displacing co-

repressor Groucho to

activate target genes with TF: LEF1/TCF

Diversity of TFs in transcriptional regulation

TF dimerization helps to increase DNA binding specificity. Typically each DB

recognizes 4-6 nucleotides and thus a dimer should double the recognition length.

Heterodimer of bHLH proteins have >10 fold increase of affinity. E.g. MyoD-

E2A is 10X MyoD-MyoD

Increase functional diversity through formation of diversified protein

complexes

A single TF can control several genes by interacting with different factors

E.g. glucocorticoid receptor (GR)

coordinates expression of many

different genes. The bound proteins

are not sufficient on their own to fully

activate transcription; the GR

completes the combination of

transcription regulators required for

maximal initiation of transcription.

When the hormone is no longer

present, the GR dissociates from DNA

and the genes return to their pre-stimulated levels.

For these GR-responsive genes, the effect of GR will depend on the presence

of GR, presence of ligand, presence of other regulatory proteins and the

binding sites on the gene.

In conclusion, specific TFs contain specific domains responsible for DNA binding,

transactivation and interaction with other molecules.

Specificity of interactions results from both protein-DNA and protein-protein interactions

(with cofactors) for controlling target genes.

Transcriptional activation of target genes could be induced by a series of signal transduction

events through extracellular factors, intracellular signalling molecules and finally nuclear

transcription factors.

Lecture 16 – Chromatin Remodeling and Transcriptional Activation

Page 65: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 65

Simplified model of eukaryotic gene transcription: Ubiquitous & cell-specific proteins bind 5’

sequence elements

Core promoter (TATA, INR, DPE)

Proximal cis-elements (GC, BLE, CCAAT)

Distal regulatory regions (enhancers, could be in intron or downstream of gene)

Updated knowledge: Involves co-factors, mediator, nucleosome modification and chromatin

remodelling.

Co-regulators:

Some TFs have both BD and AD and thus can

interact with basal transcriptional

apparatus directly while some have only BD

and no/weak AD and thus they require co-

activator to interact with basal

transcriptional apparatus directly.

Co-regulators do not bind to DNA directly

but interact with TFs or transcriptional

initiation complex.

Roles of co-activators:

Bridging TFs and PIC (pre-initiation

complex)

Helping recruitment of GTFs and RNAPII

Chemical modification of nucleosomes – covalent modification of histones

Chromatin remodelling

Eukaryotic transcriptional regulators often work in group or complexes by

interaction of specific transcription factors and co-factors in the presence of a

specific cis-element (DNA). In rare cases, RNA can act as scaffold to bring proteins

together.

The complex could function for activation or for repression.

One co-regulator may interact with different TFs.

Most of co-regulators function as either co-activators or co-repressors but some of

them can have dual functions – as an activator in one complex and a repressor in

another complex.

Co-regulators are generally more widely expressed than TFs and are involved in

regulation of higher number of genes.

Page 66: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 66

Mediator complex – originally defined as a protein complex tightly associated with RNAPII

CTD in yeast.

It is poorly defined in higher eukaryotes and can be considered a scaffold for

protein-protein interaction.

They also include a collection of co-regulators with

different activities for histone modifications and

chromatin remodelling.

The yeast mediator has 25 subunits (1.4MDa) in 4

modules, head, middle, tail and kinase.

In general, co-repressors can be broadly divided into 5

classes:

Class Properties Examples of co-regulators I Activator and repressor targets inherent to the core

machinery, promoter recognition and enzymatic functions

TAFs (TBP associated factors), TFIIA, NC2, PC4

II Activator and repressor adaptors, modulate DNA binding, target other co-regulators and the core machinery (bridging)

OCA-B/OBF-1, Groucho, Notch, CtBP, HCF, E1A, VP16(Herpes simplex virus TF that binds TAF through its AD)

III Multifunctional, structurally related but highly divergent co-regulators: some interact with RNAP II and/or multiple types of TFs, some have inherent enzymatic functions or chromatin-selective properties (mediator)

Yeast: Mediator SRBs human a: CRSP, PC2 Human b: ARC/DRIP/TRAP Human c: NAT, SMCC, Srb/Mediator

IV Chromatin (nucleosome) modifying activator and repressor adaptors, acetyltransferase or deacetylase activates with multiple substrates: histones, histone-relate proteins, activators, other co-regulators and the core machinery.

CBP/p300, GCN5, P/CAF, p160s (SRC1, TIF2, p/CIP, etc.), HDAC-1 and HDAC-2 (rpd3), Sir2

V ATP-dependent chromatin remodelling activities SNF2-ATPase (SWI/SNF, RSC) and ISWI-ATPase (NURF, ACF, ChrAC, RSF, etc.)

SWI/SNF: Switch/Sucrose, non-fermentable, ISMI: Imitation SWI

Chromatin Remodelling – nucleosome disruption and re-formation

Page 67: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 67

Remodelling complex A disrupts nucleosomes to

allow DNA-binding proteins to bind and initiate

gene expression/replication.

Remodelling complex B restores nucleosome

reformation when DNA binding proteins

dissociate

1. The same remodelling complex could

perform both nucleosome disruption and

re-formation.

Nucleosomes are dynamic as they can wrap-

unwrap-rewrap in milliseconds and thus allowing

DNA to be accessible most of the time for binding

TFs.

Nucleosome positioning & re-positioning is important to influence gene

transcription.

Histone modifying enzymes and chromatin remodelling complexes work in concert

– a particular histone modification attracts a particular type of remodelling complex.

How do transcriptional activators direct local alterations in chromatin structure?

GTs and RNAP are unable to assemble on a promoter that is packaged in nucleosome

and thus activators are needed to trigger changes to the chromatin structure of the

promoters to make the DNA more accessible.

Involves transcription regulators, chromatin remodelling complex and histone

chaperone.

Four mechanisms for locally altering chromatin:

1. Nucleosome remodelling – nucleosome sliding allows access of

transcription machinery to DNA

2. Nucleosome removal – transcription machinery assembles on nucleosome

free DNA

3. Histone replacement – histone variants allow greater access to

nucleosomal DNA

4. Histone modifications – specific patterns of histone modification

destabilize compact forms of chromatin and attract components of

transcription machinery.

Covalent modification: acetylation (A), phosphorylation (P) or

methylation (M).

Only a small number of histone modifications are known for their

function – histone code hypothesis: covalent modifications of histone

tails facilitate the binding of specific proteins to chromatin to

perform distinct functions such as transcription, replication and

repair.

Histone variants are encoded by different histone genes and are

expressed at lower levels than regular histones, insertion of different

histone variants into nucleosomes may also signal different functions

including transcriptional activation. Variants are recognized by

chromatin remodelling complexes.

Page 68: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 68

Histone acetylation is often associated with transcription activation

and this is performed by histone acetyl transferases (HATs)

Example of CBP (co-activator) and CREB (specific TF)

When cAMP increases, PKA phosphorylates S133 in CREB and allows it to interact

with CBP which has HAT activity

CBP increases transcription rates through acetylating histone tails to remodel

chromatin and increasing recruitment rate of RNAPII to promoter

CBP and closely related p300 are cofactors for many TFs and not just CREB, but they

are not generally associated with all RNAPII genes and seem to be associated with

certain classes of genes only, often in inducible genes and those involved in cell

differentiation.

Example of successive histone modification during transcription initiation in human

interferon gene promoter

Sequential histone modifications

1. Acetylation of H3K9, H4K8

2. Phosphorylation of H3S10

3. Acetylation of H3K14

GTF TFIID and a chromatin remodelling

complex bind to the chromatin to

promote the subsequent steps of

transcription initiation. TFIID and the

remodelling complex both recognize

acetylated histone tails through a

bromodomain – a protein domain

specialized to read this particular mark

on histones.

From chromatin remodelling to formation of TIC –

transcription activators can act at different steps.

Page 69: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 69

Transcription activators of different mechanisms often work synergistically where a greater

than additive effect of multiple activators working together is observed.

6 ways in which eukaryotic repressor proteins can operate:

(A) Activator proteins and repressor proteins compete for binding to the same

regulatory DNA sequence

(B) Both proteins bind DNA but the repressor prevents the activator from carrying

out its function

(C) The repressor blocks assembly of the GTF

(D) Repressor recruits a chromatin remodelling complex which returns the

nucleosomal state of the promoter region to its pre-transcriptional form

(E) Repressor attracts a histone deacetylase to the promoter

(F) Repressor attracts a histone methyl transferase which methylates histones which

maintain the chromatin in a transcriptionally silent form.

Genes can be permanently switched off via methylation

Cytosine can be methylated when it is located in a CG sequence. Methylated

nucleotides prevent DNA binding for some gene regulatory proteins.

DNA methylation patterns can be faithfully inherited by maintenance methyl

transferase.

Super enhancers are composed of large clusters of enhancers densely bound with the

mediator complex, TFs and chromatin regulators

Bound proteins are responsible for diverse enhancer-related functions such as

enhancer looping, gene activation, nucleosome remodelling and histone

modification.

Generally marked by H3K27Ac modification.

In summary, co-factors are recruited by DNA binding factors and are required to help

recruit and/or stabilize binding of the PIC. They may be recruited as a result of modification

of the DNA binding factor.

Many of these co-factors contain chromatin modifying activities including the ability to

acetylate, phosphorylate or methylate histone N-terminal tails. These modification may be

written as “histone codes” which are “read” determining interacting proteins & conferring

meaning to the activity

A histone modification work with chromatin remodelling activity to allow accessibility of

GTFs and RNAPII to the promoter and thus is integral in specific gene expression.

Page 70: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 70

Lecture 17 – Post transcriptional regulation

Post transcriptional regulation occurs after RNAPII has begun

RNA synthesis and the type of post-transcriptional regulation

of gene expression varies from gene to gene.

RNAPII is involved in capping, splicing and polyadenylation

(coupled) where the tail containing 52 tandem repeats of a

seven-amino-acid sequence (YSPTSPS) is capped when Ser5 is

phosphorylated by TFIIH.

This ensures that the RNA molecule is efficiently

capped as soon as its 5’ end emerges from the RNAP

As the polymerase continues transcribing, the Ser2

position is phosphorylated by a kinase associated with

the elongating polymerase and is eventually

dephosphorylated at Ser5 position.

When RNAPII finishes transcribing a gene, it is

released from DNA. Soluble phosphatases remove the

phosphates on its tail and it can reinitiate transcription.

Only the fully dephosphorylated form of RNAPII is

competent to being RNA synthesis at a promoter.

mRNA Capping: 5’ to 5’ addition of guanosine monophosphate (GMP) to the 5’ end of the

RNA transcript. The capping reaction is started when RNA is synthesized to ~25nt and all

enzymes are associated with the CTD.

Capping signals the translation start site, ensures correct processing and export of

mRNA through a cap binding complex (CBC), and stabilizes and protects the 5’ end

of mRNA from degradation.

Reaction involves a phosphatase removing a phosphate from 5’ end of primary

transcript, followed by a granyl transferase adding a GMP in reverse linkage (5’ to 5’

instead of 3’ to 5’) and lastly a methyl transferase adds a methyl group to the

guanosine.

RNA Splicing: due to the presence of

intron/exon in eukaryotic genes. Both the size

and number of introns are variable from genes

to genes.

Splicing consensus sequences – 5’ GU at

donor site and 3’ CAG at acceptor site.

A branch point in the lariat which is

loose consensus, YURAC

Reaction: join the 5’ end of intron to a

branch point A to form a lariat loop, cut

the 3’ end of the intron and join the two

exons.

RNA splicing is performed by RNA molecules – U1, U2, U4, U5, U6 (<200nt each) and

these are known as snRNAs. Each of them are complexed with at least 7 protein

subunits to form a snRNP (small nuclear ribonucleoprotein) and they form the core

of the spliceosome which contains >100 proteins.

Page 71: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 71

The spliceosome recognizes the intron splice sites (branch point and 5’ splice site),

brings the two ends of the intron together and removes the intron.

The snRNAs provide enzyme activities and each snRNA has different roles:

U1: recognize 5’ splice site

U2: initial binding to branch site

U5: bring two exons together

U4: sequesters U6 snRNP

U6: core component, together with U2 they catalyse two phosphoryl-transfer

reactions (transesterification

The U1 RNA has several distinct stem-loop domains

Sm binding site is required for interaction with common snRNP proteins

U1 5’ end can base pair with the 5’ splice site

U1 snRNP contains 8 common core Sm proteins and 3 U1-specific proteins

(U1-70k, U1A and U1C)

Rearrangements allow the splicing signals on the pre-RNA to be examined by

snRNPs several times during the course of splicing. This allows the spliceosomes to

check and recheck to increase the overall accuracy of splicing.

Splicing errors – exon skipping and cryptic splice site selection. Cryptic splicing

signals are nucleotide sequences of RNA that closely resemble true splicing signals

and are sometimes mistakenly used by the spliceosome.

To avoid errors, couple with transcription to avoid exon skipping as the

splicing will be executed when the first 3’ splice site is transcribed before the

next 3’ splice is available.

An exon size is more of less uniform ~150nt while intron size is variable.

Exon is bound by binding of a group of SR (Ser and Arg) proteins served as splice

enhancers to recruit U1 and U2 to define 5’ and 3’ splice sites

Introns are packaged into complexes by hnRNPs (heterogeneous nuclear

ribonucleoproteins)

Page 72: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 72

Splice-site mutations can lead to abnormal proteins and thus diseases and this is a

consequence of deletion/addition of amino acid sequence, change in reading frame

or truncated protein due to premature termination codon.

Differential RNA splicing is used to increase product diversity – estimated that 90%

of human genes produce differentially spliced transcripts.

Different protein variants can be generated by alternative splicing and thus

one gene one polypeptide.

Transcriptional termination: polyadenylation site AAUAAA which is 10 to 30nt before the

poly A tail and a GU or U rich region within 30nt of the site for poly A.

1. CstF (cleavage stimulation factor) and CPSF (cleavage

and polyadenylation specificity factor) travel with

RNAPII during transcription

2. They recognize the AAUAAA signal and the additional

cleavage factors create the 3’ end

3. Poly-A polymerase (PAP) adds ~200 A nucleotides to

the 3’ end

4. Poly-A binding protein to aid poly adenylation and

protect RNA from degradation.

A membrane-bound or secreted antibody can be

determined by differential polyadenylation.

Increase in concentration of CstF promotes

RNA cleavage.

The first cleavage site that a transcribing RNA

polymerase encounters is suboptimal and is

usually skipped in unstimulated B lymphocytes

and thus the production of a longer RNA

transcript and membrane bound antibodies.

When activated to produce antibodies, the CstF

concentration increases and cleavage now

occurs and a shorter transcript is produced, resulting in secreted antibodies.

RNA Export from the nucleus to the cytoplasm

Each RNA binds multiple proteins including the nuclear export receptor (export-

ready RNA).

Some binding proteins are co-transported and some are not.

Page 73: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 73

Nuclear export receptor guides the RNA to the nuclear pore complex for export and

only a small fraction of RNA (<10%) transported to cytoplasm as protein-RNA

complex.

SR: spliceosome proteins containing a domain rich in serine and arginine, CBC: cap-

binding complex, PABP: poly-A binding proteins, EJC: exon junction complex.

The failure to correctly splice a pre-mRNA often introduces a premature stop codon

into the reading frame for the protein. These abnormal mRNAs are destroyed by the

nonsense-mediated decay mechanism.

An mRNA molecule bearing EJCs to mark successfully completed splices is

first met by a ribosome that performs a “test” round of translation. As the

mRNA passes through the tight channel of the ribosome, the EJCs are

stripped off and successful mRNAs are released to undergo translation.

However if an in-frame stop codon is encountered before the final EJC is

reached, the mRNA undergoes nonsense-mediated decay which is triggered

by Upf proteins that bind to EJC.

mRNA localization – translated immediately in cytosol (most common), directed to ER for

synthesis of membrane and secreted proteins or directed to specific intracellular locations

prior to translation.

Localization is either by directed transport on cytoskeleton, random diffusion and

trapping or generalized degradation in combination with local protection by

trapping.

Page 74: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 74

RNA editing – two main types, either A to I (inosine) or C to U (minor).

This involves RNA editing complex with deaminase activity, recognizing a specific

target sequence and/or secondary structure.

This affects the protein sequence, RNA splicing, transport etc.

Lecture 18 – Translational and post-translational control

Translational process: in eukaryotes, proteins are synthesized in ribosomes (80S) which

make up a polysome (polyribosome).

Ribosome is made up of a large subunit (60S)

and a small subunit (40S) and consists of 60%

RNA and 40% protein.

Large subunit consists of 28S (4718bp), 5.8S

(160bp) and 5S (120bp) with 49 ribosomal

proteins

Small subunit consists of 18S (1847bp) with 33

ribosomal proteins.

It takes about 20 seconds to several minutes for

each protein synthesized.

Start codon is AUG and stop codon is UAG.

eIF4E and eIF4G are eukaryotic translation

initiation factors.

Ribosome scanning model: small ribosome subunit scans for the first AUG codon in a

favourable context. Kozak consensus ( ), bolded bases are the most

important and changes to them will cause reduction of translation efficiency by 10

fold.

Translational initiation occurs from the first AUG codon in ~90% of mRNAs.

Translation may also occur from the second of later AUG to generate

proteins with different N-terminal.

Translational initiation

1. Binding of Met-initiator tRNA to eIF2 and later to small ribosome subunit

directly at the P site

2. Binding to the cap of mRNA with additional initiation factors (eIF4E and

eIF4G)

3. Scanning for the first AUG

4. Dissociation of eIF2 and binding of large ribosome subunit

5. Addition of second amino acid tRNA

Each ribosome has three binding sites for tRNAs and one binding site for mRNA.

Cycle for amino acid addition has 4 steps:

Page 75: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 75

1. Aminoacyl-tRNA binding to EF-1 and enters A-site

by codon-anticodon pairing. First proofreading

involves 16S rRNA which recognizes correct pairs

and closes it tightly to trigger EF-1 for GTP

hydrolysis.

2. Peptide bond formation

3. Large subunit translocation – binding of EF-2 and

GTP hydrolysis cause conformation change that

move tRNA to the P-site after formation of peptide

bond. EF-2 then dissociates

4. Small subunit translocation, empty tRNA ejected

Translational termination – binding of a release factor to

the A site when a termination codon is encountered. This

results in chain termination and release of nascent

polypeptide and the dissociation of the ribosome.

Eukaryotic releasing factors recognize all three

termination codons – UAA, UAG and UGA.

Post-translational process – involves protein folding, covalent

modifications and formation of complexes.

Incorrectly folded polypeptides with stretches of

hydrophobic amino acids on surface will eventually be

destroyed in proteosomes as they are toxic to the cell by formation of aggregates.

Protein folding is already in halfway by the time the ribosome releases the nascent

peptides. Many newly released proteins have open and flexible structures called

molten globule which is subjected to further folding.

In eukaryotes, Hsp70 and Hsp60 are major families of molecular chaperones in

eukaryotes which have affinity for exposed hydrophobic patches on incompletely

folded protein.

Hsp70 acts before translation is complete. It binds to a string of 4-5

hydrophobic amino acids, hydrolyses ATP and clamp down very tightly on

the target. It then rebinds ATP and releases target protein.

The Hsp60 acts late. It forms a large barrel-shaped structure and captures a

misfolded protein by hydrophobic interaction. It hydrolyses ATP and adds a

cap protein (GroES) to increase the dimension of the barrel rim, and

incubates for ~10seconds. Ejection of the correctly folded protein is

accomplished by ATP hydrolysis.

For protein degradation, proteosomes are used – comprises of 20S core and two

19S caps which is a complex of ~20 subunits (>6 are ATPases) which recognizes

ubiquitinated and unfolded proteins.

Proteosomes are highly abundant, constituting ~1% of cellular proteins.

Improperly folded proteins are targeted by attachment of ubiquitin (76aa).

Ubiquitinated proteins are translocated to proteosomes and ubiquitin is

removed by ubiquitin hydrolase for recycling. The targeted proteins are

unfolded in the ring of the cap and threaded into the core for degradation.

Note that a protease cuts once and doesn’t need ATP while a proteosome

cuts the entire protein multiple times into short peptides.

Page 76: LSM2232 Genes, Genomes & Biomedical Implications...LSM2232 Genes, Genomes & Biomedical Implications Page | 1 Lecture 1/2/3 (Low BC Part 1) Humans have 23 chromosomes and the chromosome

LSM2232 Genes, Genomes & Biomedical Implications

Page | 76

Regulation at mRNA level – different mRNA have different half-lives

Many unstable mRNA have AU rich sequence in their 3’UTRs

A cap associated enzyme, deadenylase (DAN) shortens the poly-A tail. Actively

translated mRNA tend to have longer half-lives

Some mRNAs are decayed by specialized mechanisms – ferritin and transferrin in

the presence of iron.

When iron levels are low, the binding of aconitase blocks translation of

ferritin mRNA. When iron levels are high, it will bind to aconitase and it will

dissociate away from the mRNA and begin synthesis of ferritin.

In transferrin, the binding of aconitase blocks an endonuclease cleavage site

and thus stabilizes the mRNA, allowing translation and thus the import of

iron across the plasma membrane.

RNAi and MicroRNA

RNAi is a short single-stranded RNAs (20-30nt) and is a host defence mechanism to

destroy foreign RNAs

RNAi serve as guide RNA that selectively reorganize and bind through base

pairing to other RNAs in the cell. When the target is a mature mRNA, the

RNAi can inhibit its translation or catalyse its destruction by recruiting

Argonaut proteins.

MicroRNAs (miRNA) are a newly discovered class of small RNAs (21-25nt, typically

23nt) that is transcribed by RNAPII and have cap and poly A tail.

>1000 miRNA genes in human genome are present as independent genes or

in introns.

They appear to regulate at least one-third of all human protein coding genes.

Upon export into cytoplasm, Dicer (RNAse) further cleaves/dices miRNA to

result in a single stranded mature miRNA. It then forms an RISC (RNA-

induces silencing complex) with Argonaut and other proteins. It targets

specific mRNAs based on base pairing and lead to rapid mRNA degradation.