the genetic code math-cs camp, 19.07.06, singapore mikhail s. gelfand research and training center...

Post on 16-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Genetic Code

Math-CS Camp, 19.07.06, Singapore

Mikhail S. Gelfand

Research and Training Center of Bioinformatics,Institute for Information Transmission Problems, Moscow, Russia

andDepartment of Bioengineering and Bioinformatics,

Moscow State University

The Biological Code by Martynas Yčas (London, 1969)

Биологический код (Mосква, 1971)

0

20

40

60

80

100

120

140

47 49 51 53 55 57 59 61 63 65 67 69 71

year(s)

refs

.

191X192X193X

18XX

190X

1941-451946-501951-551956

To apply mathematics in biology, a mathematician has to understand biology. Israel Gelfand

Plan

• Pre-history– Genetics– Evolutionary theory– Chemistry

• Cracking the Code

• Update

Genetics: Gregor Mendel (1822-1884)

• Attended the Philosophical Institute in Olomouc

• Since 1843 – at the Augustinian Abbey of St. Thomas in Brno

• 1851-1853 – studied in the University of Vienna

• 1856-1863 – cultivated 28 thousand pea plants

• The Three Laws of Genetics (“Experiments on Plant Hybridization”)– Read to the Natural History Society of

Brunn in Bohemia (1865)– Published in Proceedings of the

Natural History Society (1866)• Since 1866 – abbot, stopped working

in science

The seven traits of pea plants studied by Mendel

The first law

Crossing two pure lines different in some trait (e.g. yellow / green seeds), one gets only one variant (allele) in the first generation (the dominant allele)

F0

F1

The second law

Crossing two pure lines different in some trait (e.g. yellow / green seeds), one gets only one variant (allele) in the first generation (the dominant allele), and the distribution 3:1 of the dominant and recessive alleles in the second generation.

F0

F1

F2

(Law of large numbers)

F0

F1

F2

The 3:1 ratio is seen only when the number of observations is sufficiently high.

The third lawTwo different traits are inherited independently

(in the second generation the ratio is 9:3:3:1)

F0

F1

F2

F2

What if we take a pair with a different assortment of the same traits?

F0

F1

F2

F0

?

Same F1

F0

F1

F2

F0

F1

Same F2… regardless of the initial assortment

F0

F1

F2

F0

F1

Incomplete dominance

Incomplete dominance

?

Incomplete dominance

?

Incomplete dominance

Charles Darwin (1809-1882)

• 1825-27 in Edinburgh University and 1827-31 in University of Cambridge – natural history, geology, botany

• 1831-1836 – Voyage of the Beagle

• Journal of Researches into the Geology and Natural History of the various countries visited by H.M.S. Beagle (1839)

The Law of Natural Selection

• Species make more offspring than can grow to adulthood. • Populations remain roughly the same size. • Food resources are limited, but are relatively constant most of

the time.

• In such an environment there will be a struggle for survival among individuals.

• In sexually reproducing species, generally no two individuals are identical.

• Much of the variation is heritable.

• Individuals with the "best" characteristics will be more likely to survive …

• … those desirable traits will be passed to their offspring …• … and then inherited by following generations, becoming

prevalent and then fixed among the population through time.

Re-discovery of the Mendel laws and emergence of modern genetics

• Hugo de Vries (1900)• William Bateson

– genetics, gene, allele

• Walter Sutton – Link between genes and

chromosomes(1902)

• Archibald Garrod – Genetic cause of some

human disease (1902-08-23)

• Thomas Morgan, work on Drosophila. – Mutants: spontaneous

appearance of new alleles (a fly with white eyes in a population of flies with red eyes) (1908)

– Universal acceptance of chromosomes (1915)

Gene = a set of non-complementing mutationsEdward Lewis: Do two recessive mutations occur in the same gene?

F1: Mutant phenotype

F1: Wild-type phenotype

F2 Mutant phenotypes persist in cis (same gene). Mutant phenotypes reappear in trans (different genes)

F1: Mutant phenotype

F1: Wild-type phenotype

F2: All mutant phenotypes

F2

WT WT WT WTMut Mut Mut Mut Mut

1 2 2 41 2 1 2 1 9:7

DNA

• Friedrich Miescher (1869)– Nucleolin– Richard Altmann: nucleic acid (1889). Only in chromosomes

• Phoebus Levene (1929)– Components (four bases, the sugar-phosphate chain)– Nucleotide: phosophate+sugar+base unit

• Hammarsten and Casperson (1930s)– DNA is a long polymer; crystals

• Astbury (1938)– X-ray photographs

• Chargaff rules (1947) – In many organisms, #A=#T, #C=#G

Transforming factor (Frederick Griffith,1928)

… = DNA (Oswald Avery, Colin McLeod, Maclyn MacCarthy,1944)

DNA is the genetic medium of phages (Alfred Hershey and Martha Chase, 1948)

32P – radioactive DNA35S – radioactive proteins

Only DNA enters the cell

… and only DNA is inherited by progeny phages

Erwin Schrödinger

“What is life”, 1946: The gene is an aperiodic crystal

The structure of DNA …

• Maurice Wilkins and Rosalind Franklin: high-resolution crystals (1950-1953)

… is the double helixJames Watson and Francis Crick (1953)

The Nature paper: a few lines more than one page

The DNA chain

Complementary pairs of nucleotides

С

Т

G

A

Figures from the second

Watson-Crick paper

The main distances are the same

One base-pair in the double helix (axial view)

The double helix, stick and ball models, axial view

The double helix, stick and ball models, side view

Three models for the replication of DNA

The semi-conservative one is correct (Matthew Meselson and Franklin Stahl, 1958)

Q: What would be the outcome if one of the two other models were correct?

Cells are grown on the 15N (heavy) medium for several generations, then transferred to 14N (light) medium

Electron micrograph of replicating DNA

The Central Dogma (F.Crick)DNA RNA protein

Crossingover and recombination

• Genes from one chromosome are not inherited independently

• Recombination allows for relative mapping of gene positions on the chromosome:if two genes are close, the frequency of recombination will be lower

Collinearity of the gene and the protein (Charles Yanofsky, 1967)

The Genetic Code• The genetic code:

correspondence between DNA and protein (George Gamow, 1954) (Георгий Гамов)

• Crick and co-authors (1961):– Non-overlapping (one mutation affects one amino

acid)– Degenerate (many codons for one amino acid)– Comma-less (no specific markers between codons)– Periodic

The codon is a triplet• Mutations caused by acridine

– Non-leaky (instead of weakened function, simply no function)– Mechanism: insertions and deletions of nucleotides

(the downstream part of the gene completely scrambled the code is comma-less)

CUACUACUACUACUACUACUACUACUACUACUACUACUALeuLeuLeuLeuLeuLeuLeuLeuLeuLeuLeuLeuLeu

insertionCUACUACUACGUACUACUACUACUACUACUACUACUACULeuLeuLeuArgThrThrThrThrThrThrThrThrThr

deletionCUACUACUACUACUACUACUACUACUACACUACUACUACLeuLeuLeuLeuLeuLeuLeuLeuLeuHisTyrTyrTyr

U

G

Double mutants and revertants

• Two classes of mutations: (+) and (–) • Double mutants (+)¤(+) and (–)¤(–) still produce loss-of-

function phenotypes• Double mutants (+)¤(–) and (–)¤(+) produce leaky

phenotypes

CUACUACUACGUACUACUACUACUACUACUACUACUACULeuLeuLeuArgThrThrThrThrThrThrThrThrThr

¤CUACUACUACUACUACUACUACUACUACACUACUACUACLeuLeuLeuLeuLeuLeuLeuLeuLeuHisTyrTyrTyr

CUACUACUACGUACUACUACUACUACUACACUACUACUALeuLeuLeuArgThrThrThrThrThrThrLeuLeuLeu

Triple mutants are revertants!

• Triple mutants of the same class, (+)¤(+)¤(+) and (–)¤(–)¤(–), produce leaky phenotypes

CUACUACUACGUACUACUACUACUACUACUACUACUACUACULeuLeuLeuArgThrThrThrThrThrThrThrThrThrThr¤CUACUACUACUACUACUACGUACUACUACUACUACUACUACULeuLeuLeuLeuLeuLeuArgThrThrThrThrThrThrThr

double mutant – loss of function phenotype

CUACAUCUACGUACUACUACGUACUACUACUACUACUACUACLeuLeuLeuArgThrThrThrTyrTyrTyrTyrTyrTyrTyr¤CUACUACUACUACUACUACUACUACUACGUACUACUACUACULeuLeuLeuLeuLeuLeuLeuLeuLeuArgThrThrThrThr

triple mutant – leaky phenotype

CUACUACUACGUACUACUACGUACUACUACGUACUACUACUALeuLeuLeuArgThrThrThrTyrTyrTyrValLeuLeuLeu

Cracking the Code (F.Crick, M.Nirenberg, J.Matthaei, S.Ochoa,

G.Khorana, … and you)

• Regular oligonucleotides– … UUUUUUUUUU …– … UCUCUCUCUC …– … UCAUCAUCAU …

• Random oligonucleotides with known composition• Changes in proteins caused by deamination-

caused mutations: CU, AG• Changes in proteins caused random mutations• (tRNA binding in the presense of trinucleotides)

20 amino acids and 64 codons

• Alanine• Cysteine• Aspartate• Glutamate• Phenylalanine• Glycine• Histidine• Isoleucine• Lysine• Leucine• Methionine• Asparagine• Proline• Glutamine• Arginine• Serine• Threonine• Valine• Tryptophan• Tyrosine

UUU Phe UCU   UAU   UGU  

UUC   UCC   UAC   UGC  

UUA   UCA   UAA   UGA  

UUG   UCG   UAG   UGG  

CUU   CCU   CAU   CGU  

CUC   CCC Pro CAC   CGC  

CUA   CCA   CAA   CGA  

CUG   CCG   CAG   CGG  

AUU   ACU   AAU   AGU  

AUC   ACC   AAC   AGC  

AUA   ACG   AAA Lys AGA  

AUG   ACA   AAG   AGG  

GUU   GCU   GAU   GGU  

GUC   GCC   GAC   GGC  

GUA   GCA   GAA   GGA  

GUG   GCG   GAG   GGG  

Triplet binding data (from Crick’s Croonian lecture, 1966)

Reading the code: The ribosome

Translation

Polysomes

Adaptors (F.Crick and S.Brenner)

tRNA: secondary structure

tRNA: three-dimensional structure

tRNA and aminoacid-tRNA-synthetase

Initiation of translation

Translation start sitesdnaN ACATTATCCGTTAGGAGGATAAAAATG

gyrA GTGATACTTCAGGGAGGTTTTTTAATG

serS TCAATAAAAAAAGGAGTGTTTCGCATG

bofA CAAGCGAAGGAGATGAGAAGATTCATG

csfB GCTAACTGTACGGAGGTGGAGAAGATG

xpaC ATAGACACAGGAGTCGATTATCTCATG

metS ACATTCTGATTAGGAGGTTTCAAGATG

gcaD AAAAGGGATATTGGAGGCCAATAAATG

spoVC TATGTGACTAAGGGAGGATTCGCCATG

ftsH GCTTACTGTGGGAGGAGGTAAGGAATG

pabB AAAGAAAATAGAGGAATGATACAAATG

rplJ CAAGAATCTACAGGAGGTGTAACCATG

tufA AAAGCTCTTAAGGAGGATTTTAGAATG

rpsJ TGTAGGCGAAAAGGAGGGAAAATAATG

rpoA CGTTTTGAAGGAGGGTTTTAAGTAATG

rplM AGATCATTTAGGAGGGGAAATTCAATG

Translation start sites aligned

dnaN ACATTATCCGTTAGGAGGATAAAAATG

gyrA GTGATACTTCAGGGAGGTTTTTTAATG

serS TCAATAAAAAAAGGAGTGTTTCGCATG

bofA CAAGCGAAGGAGATGAGAAGATTCATG

csfB GCTAACTGTACGGAGGTGGAGAAGATG

xpaC ATAGACACAGGAGTCGATTATCTCATG

metS ACATTCTGATTAGGAGGTTTCAAGATG

gcaD AAAAGGGATATTGGAGGCCAATAAATG

spoVC TATGTGACTAAGGGAGGATTCGCCATG

ftsH GCTTACTGTGGGAGGAGGTAAGGAATG

pabB AAAGAAAATAGAGGAATGATACAAATG

rplJ CAAGAATCTACAGGAGGTGTAACCATG

tufA AAAGCTCTTAAGGAGGATTTTAGAATG

rpsJ TGTAGGCGAAAAGGAGGGAAAATAATG

rpoA CGTTTTGAAGGAGGGTTTTAAGTAATG

rplM AGATCATTTAGGAGGGGAAATTCAATG

Elongation

Termination of translation

Dialects

• The genetic code is not universal• … but the differences are relatively minor• … occur mainly in small genomes of organelles• … and involve specific codon families.• In many cases symmetry is increased, or entire families

reassigned.• Many changes involve stop codons

Reassignment

CUN (=CUU, CUC, CUA, CUG): LeuThr

Possible initiation codons in addition to AUG (Met):NUG (=GUG,UUG,CUG), AUN (=AUU,AUC,AUA)

UAA, UAG: stop Gln

More symmetry

AUU IleAUC IleAUA IleMetAUG Met

AGU SerAGC SerAGA ArgSerAGG ArgSer

UGU CysUGC CysUGA stopTrpUGG Trp

Vulnerable codon families

CGU ArgCGC ArgCGA Arg noneCGG Arg none

AGU Ser AGC SerAGA Arg Ser Gly stop AGG Arg Ser Gly stop none

GGU GlyGGC GlyGGA GlyGGG Gly

Stop-containing families

UGU CysUGC CysUGA stop Trp Cys SecUGG Trp

UAU TyrUAC TyrUAA stop Tyr GlnUAG stop Gln (Pyl)

How many letters are there in the English alphabet?

How many letters are there in the English alphabet?

• 26 (everybody knows) …

How many letters are there in the English alphabet?

• 26 (everybody knows) …

• … but we are discussing the book by Yčas …

How many letters are there in the English alphabet?

• 26 (everybody knows) …

• … but we are discussing the book by Yčas …

• … so everybody are naïve

How many amino acids?

• Chemists: hundreds– many occur in proteins:

post-translation modifications

• How many amino acids are encoded by DNA?

Crick:

Is formyl-methionine a “standard” amino acid?

• Occurs in bacteria at N-termini of all recently synthesized proteins (may be enzymatically removed later on)

• Has three codons: AUG, GUG, UUG– unlike “inernal” methionine encoded only

by AUG– by the way, internal GUG encodes Valine

and internal UUG encodes Leucine

Selenocysteine• In all three domains of life (bacteria, eukaryotes, archaea)• Encoded by UGA followed by a special hairpin structure

(SECIS)– without this hairpin UGA is a stop-codon– several genes for selenoproteins per genome (or none)– corresponds to cysteine in homologs (more efficient in enzymes)

• Complicated mechanism of incorporation (specific tRNA, seryl-tRNA-synthetase, conversion to SeCys on tRNA, specific elongation factor)

Alignment of SECIS elements

The consensus

SECIS structure

SECIS elements: examples

Pyrrolysine

• In methanogenic archaea• A derivative of lysine• Directly encoded (unlike selenocysteine).

Standard mechanism: – UAG codon– specific tRNA – aminoacyl-tRNA

• UAG rarely used as a stop codon– never as the only stop of a gene

Thanks

• Wikipedia• Ergito• Authors of papers,

photographs and Internet resources

• Professor Leong Hon Wai• The organizers• The assistants• The students

top related