bioinformatics practical for biochemists · &other idea is that certain triplets make...
Post on 09-Jun-2020
1 Views
Preview:
TRANSCRIPT
Bioinformatics Practicalfor
Biochemists
Andrei Lupas, Birte Höcker, Steffen SchmidtWS 2012/2013
01. DNA & Genomics
1
Description
• Lectures about general topics in Bioinformatics & History
• Tutorials will provide you with a toolbox of bioinformatics programs to analyze data
• Hands-On sessions will give you the opportunity to use these tools
2
Course Outline
• Mon – DNA & Genomics
• Tue – Introduction to Proteins
• Wed – Annotation of Sequence Features
• Thr – Protein Classification
• Fri – Evolution & Design
Course Material:
eb.mpg.de/research/departments/protein-evolution/teaching
3
Course Outline
• 13:00-14:00 Presentation
• 14:15-17:30 Tutorial (2 x 30min) & hands-on practical
• You will need to keep an electronic lab notebook
• Fri afternoon: Test Exercises
4
Software Requirements
• Browser (e.g. Firefox)
• “Advanced” Word Processor
• PyMOL (www.pymol.org – free for teaching)
5
DNA & Genomics
1953 Model of DNA (F. Crick)
6
wikipedia.org
What is the “genetic material”?
• 1865 Gregor Mendel
• basic rules of heredity
• 1869 Friedrich Miescher
• discovery of ‘nuclein’ (DNA), Hoppe-Seyler repeated all experiments
• 1881 Edward Zacharias
• chromosomes are composed of nuclein
• 1899 Richard Altmann
• renaming nuclein to nucleic acid
7
• 1928 Frederick Griffith
• “transforming principle” - Str. pneumoniae experiment
• 1944 Avery & McCarty
• Griffith’s “transforming principle”is DNA
history.nih.gov / wikipedia.org
DNA is the “transforming material”
8
bacteriophagetherapy.info / www.lifesciencesfoundation.org
DNA is the genetic material
• 1950 Erwin Chargaff
• A/T, C/G same amount in different tissues
• 1952 Hershey & Chase
• DNA is the genetic material using 32P/35S Phage/E. coli experiment
9
http://osulibrary.oregonstate.edu/specialcollections/coll/pauling/dna/notes/1952a.22-ms-01.html
Solving the DNA structure
• 1952/53 Linus Pauling
• beat Cavendish Lab in discovery of α-helix
• Cavendish Lab (Cambridge) Watson & Crick allowedto work full-time on DNA
• Pauling shared manuscriptwith Cavendish Lab before publication(via his son Peter Pauling)
10
Solving the DNA structure
• 1952 Franklin & Wilkins
• X-ray of B-DNA - Wilkins showed results to Watson & Crick
• periodicity, phosphates are outside
• 1953 Crick & Watson
• model of B-DNA
orig
inal p
apers
NAT
UR
E| VO
L 421| 23 JAN
UA
RY 2003| ww
w.nature.com
/nature397
© 2003 N
ature Publishing
Group
11
Nature, 1953
Solving the DNA structure
original papers
NATURE | VOL 421 | 23 JANUARY 2003 | www.nature.com/nature 397© 2003 Nature Publishing Group
original papers
NATURE | VOL 421 | 23 JANUARY 2003 | www.nature.com/nature 397© 2003 Nature Publishing Group
original papers
NATURE | VOL 421 | 23 JANUARY 2003 | www.nature.com/nature 397© 2003 Nature Publishing Group
original papers
NATURE | VOL 421 | 23 JANUARY 2003 | www.nature.com/nature 397© 2003 Nature Publishing Group 12
DNA structure
13
Getting the “code”
• 1953 George E. Palade
• “RNA organelles” (ribosomes)
• 1957 Crick et.al
• suggest non-overlapping triplets
• only 20 out of 64 triplet code for an amino acid
• “comma-free code”
14
Getting the “code”
• 1961 Nirenberg & Matthaei
• polyU mRNA produces polyF protein
• complete genetic code
• 1961 Sydney Brenner
• no overlapping codes
• concept of mRNA
• triplet Code (Crick, Brenner, Barnett, Watts-Tobin)
NO. AS09 December 30, 1961 ‘NATURE 122i
GENERAL NATURE OF THE GENETIC CODE FOR PROTEINS
@ DR.I R. J./WATTS-TOBIN - Medical Research Council Unit for Molecular Biology,
Cavendish Laboratory, Cambridge
HERE is now a mass of indirect evidence which suggests that ths amino-a&d sequence along the
polypeptids chain of a protein is determined by the sequence of the bases along some particular part of the nucleic acid of the genetic material. Since there are twenty common amino-acids found throughout Sature, but only four common bases, it haa often been surmised that the sequence of the four baaes is in soms way a code for the sequence of the amino- acids. In this article ws report genetic experiments which, togsther with the work of others, suggest that the genetic code is of the foUowing general type:
(a) A group of three bases (or, leas likely, a multiple of three bases) codes one amino-acid.
(b) The code is not of the overlapping type (see Fig. 1).
(c) The sequence of the baass is read from a fixed Btarting point. This dstsrminsa how the long sequences of bases are to bs correctly read off as triplets. There ars no special ‘commas’ to show how to select the right triplets. If the starting point is displaced by one bass, then the reading into triplets is displaced, and thus becomes incorrsct.
(d) The code is probably ‘degenerate’; that is, in general, one particular ammo-acid can be coded by one of several tripieta of bases.
The Reading of the Code The evidence that the genetic cods is not over-
lapping (see Fig. 1) doss not come from our work. but from that, of Wittmannl and of Tsugita and Frasnkel-Conrat on the mutants of tobacco mosaic virus produced by nitrous asid. In an overlapping triplet code, an alteration to one baas will in general change three adjacent amino-acids in the polypeptide chain. Their work on the alterations produced in the protein of the virus show that usually only one amino-acid at a time is changed a8 a result of treating the ribonuclsic acid (RNA) of the virus with nitrous acid. In the rarer cases where two amino-acids are altered (owing presumably to two separate deamma- tions by the nitrous acid on one piece of RNA), the altered amino-acids ars not in adjacent positions in the polypeptide chain.
Brsnnera had previously shown that, if the code were universal (that is, the same throughout Nature), then all overlapping triplet codes were impossible. Moreover, all the abnormal human hremoglobins studied in detail4 show only single amino-acid changes. The newer experimental rssulta ssssntially rule out all simple codes of the overlapping type.
If the code is not overlapping, then there must be Borne arrangement to show how to select the correct triplets (or quadruplets, or whatever it may be) along the continuous sequence of bases. One obvious suggestion is that, say, every fourth baas is a ‘comma’. &other idea is that certain triplets make ‘sense’, whereas others make ‘nonsense’, as in the comma-free
codes of Crick, Griffith and Or&j. Alternatively, the correct choice may be made by starting at a fixed point and working along the sequence of bases three (or four, or whatever) at a time. which we now favour.
It is this possibility
Experimental Results Our genetic experiments have heen carried out on
the B cistron of the rn region of the bacteriophage T’4, which attacke strains of Eschmichia coli. This is the system so brilliantly exploited by BenzeP*‘. The rn region consists. of two adjacent genes, or ‘cistrona’, called cistron A and cistron B. The wild- type phags will grow on both E. coli B (here called B) and on J!?. coli K12 (a) (here called K), but a phage which has lost the function of either gene will not grow on K. Such a phags produces an r plaque on B. Many point mutations of ths genes are known which behave in this way. Deletions of part of the region are also found. Other mutations, known as ‘leaky’, show partial function; that is, they will grow on R but their plaque-type on B is not truly wild. We ‘report hers our work ,on the mutant P 13 (now renamed FC 0) in the Bl segment of the B cistron. Thie mutant was originally produced by the action of proflavins.
We@ have previously argued that acridines such aa pro5vin act as mutagens because they add or dslsts a base or bases. The most striking evidence in favour of this is that mutants produced by a&dines are seldom ‘leaky’ ; they are almost always completely lacking in the function of the gene. Since our note was published, experimental data from two eourcsa have been added to 0u.1: previous evidence: (1) we have examined a set of 126 pn mutants made with acridine yellow; of these only 6 are IeaLT- (typically about half the mutants made with base analogues are leaky) ; (2) Streisinger lo has found that whereas mutants of the lysozyme of phage T4 produced by baas-analogues are usually leaky, all lysozyme mutants produced by proflavin are negative, that is, the function is completely lacking.
If an acridine mutant i,3 produced by, say, adding a base, it should revert to ‘lvild-type’ by deleting a bass. Our work on revertants of FC-0 shows that it-usually
Starlinq point 3 ,, ;$I Overlappirq code
+7
NUCLEIC ACID * I’ ’ ’ ’ ’ ’ ’ --- ,-J+-~----
1 3 '
ETC.
Non-overlapplnq Code
Fig. 1. To show the difference between an overlapping code and a non-overlappinu code. The short wrticnl lines represent the bases of the nucleic acid. The czw illustrated is for a triplet code
15
E. coli
Getting the “code” – incl. start & stop codons
• Alternative start codon
• AUG (83%)
• GUG (14%)
• UUG (3%)
• Alternative stops
• UAA (63%, ‘ochre’)
• UGA (29% ‘opal’) / or Sec (Seleoncys)
• UAG (8%, ‘amber’)
16
wikipedia.org / yale.edu
Gene Structure
• 1977 Sharp & Roberts
• pre-mRNA is processed
• 1982 Cech
• ribo(nucleic en)zymes
• 1980 Joan A. Steitz
• role of snRNPs in splicing
17
Gene Structure – Eurkayotes / Prokaryotes
lac Operon
1: Regulatory gene
Promotor region
3: ß-galactosidase4: ß-gal permease8: ß-gal transacetylase
18
Miller, O. L. et al. Visualization of bacterial genes in action. Science 169, 392–395
Gene structure – Polysomes in Prokaryotes
• EM picture of polysomes on a chromosome
19
Transcription initiation
DNA
mRNA with Ribosomes
Griswold, A. (2008) Nature Education 1(1)Understanding Bioinformatics, Zvelebil & Baum, 2007
Gene Structure – Prokaryotic Operons
lac Operon
1: Regulatory gene
Promotor region
3: ß-galactosidase4: ß-gal permease8: ß-gal transacetylase
20
u-tokyo.ac.jp
Gene Structure – Prokayotes
21
Gene Structure – Eurkayotes / Prokaryotes
lac Operon
1: Regulatory gene
Promotor region
3: ß-galactosidase4: ß-gal permease8: ß-gal transacetylase
22
zazzle.com
23
Gene Structure – Eukaryotes
Gene Structure – Comparison
!Eukaryote! Prokaryote!
Genes!
• Often&have&introns&
• Intraspecific&gene&order&and&number&generally&relatively&stable&&
• many&non8coding&(RNA)&genes&
• There&is&NOT&generally&a&relationship&between&organism&complexity&and&gene&number&
• No&introns&
• Gene&order&and&number&may&vary&between&strains&of&a&species&
Gene!regulation!
• Promoters,&often&with&distal&long&range&enhancers/silencers,&MARS,&transcriptional&domains&
• Generally&mono8cistronic&
• Promoters&
• Enhancers/silencers&rare&&
• Genes&often®ulated&as&polycistronic&operons&
Repetitive!sequences!• Generally&highly&repetitive&with&genome&wide&families&from&transposable&element&propagation&
• Generally&few&repeated&sequences&
• Relatively&few&transposons&
Organelle!(subgenomes)!
• Mitochondrial&(all)&
• chloroplasts&(in&plants)&• Absent&
24
Genomic era
• 1975 Frederick Sanger
• dideoxy sequencing
• 1986 Human Genome Initiative
• Genomes
• 1995 H. influenca 1.8 Mb 1.7k genes
• 1997 E. coli 4.6 Mb 4.3k genes
• 1996 S. cerevisiae 12.5 Mb 5.7k genes
• 1998 C. elegans 100 Mb 21.7k genes
• 2000 D. melanogaster 121 Mb 17k genes
25
Kavanoff, Nature Education : Supercoiled chromosome of E. coli.
Prokaryotic Genome
• E. coli
• 6 Mbp
• 1 by 2 µm cell size
26
Science (2001), Nature (2001)
The human genome
• 2001 Draft H. sapiens 2.9 Bb 20-30k genes
27
The human genome
28
Gene content
29
Genome Structure – Comparison
!Eukaryote! Prokaryote!
Size!
• Large&(10&Mb&–&100,000&Mb)&
• There&is¬&generally&a&relationship&between&organism&complexity&and&its&genome&size&(many&plants&have&larger&genomes&than&human!)&
• Generally&small&(<10&Mb;&most&<&5Mb)&
• Complexity&(as&measured&by&#&of&genes&and&metabolism)&generally&proportional&to&genome&size&
Content! • Most&DNA&is&nonLcoding& • DNA&is&“coding&gene&dense”&
Telomeres/!Centromeres!
• Present&(Linear&DNA)&• Circular&DNA,&doesn't&need&telomeres&
• Don’t&have&mitosis,&hence,&no¢romeres.&
Number!of!chromosomes!
• More&than&one,&(often)&including&those&discriminating&sexual&identity&
• Often&one,&sometimes&more,&Lbut&plasmids,¬&true&chromosome.&
Chromatin! • Histone&bound&(which&serves&as&a&genome®ulation&point)&
• No&histones&
• Uses&supercoiling&to&pack&genome&
&
30
Gene content
31
Gregory (2005), Nature
Human Genome Content
SINEs
LINEs
Protein-codinggenes
Introns
Miscellaneousunique sequences
Miscellaneousheterochromatin
Segmentalduplications
Simple sequencerepeats
DNA transposonsLTR retrotransposons
20.4%
13.1%
1.5%
25.9%
11.6%
8%
5%
3%2.9%
8.3%
32
Gene Structure – Eukaryotic Gene
Scalechr1:
Common SNPs(135)
RepeatMasker
10 kb hg19156,225,000 156,230,000 156,235,000 156,240,000 156,245,000 156,250,000
UCSC Genes (RefSeq, UniProt, CCDS, Rfam, tRNAs & Comparative Genomics)
Placental Mammal Basewise Conservation by PhyloP
Simple Nucleotide Polymorphisms (dbSNP 135) Found in >= 1% of Samples
Repeating Elements by RepeatMasker
SMG5
Mammal Cons4 _
-4 _
33
Gregory (2005), Nature
Human Genome Content
SINEs
LINEs
Protein-codinggenes
Introns
Miscellaneousunique sequences
Miscellaneousheterochromatin
Segmentalduplications
Simple sequencerepeats
DNA transposonsLTR retrotransposons
20.4%
13.1%
1.5%
25.9%
11.6%
8%
5%
3%2.9%
8.3%
34
wikipedia.org
Transposable Element - Mobile Elements / Jumping genes
• Barbara McClintock (1902 - 1992)
• studies in the 40’s & 50’s of spotted kernels inmaize
• discovery of “controlling elements”
• initially thought to be unique to maize but lateralso found in eukaryotes, bacteria, viruses,phages & plasmids
• Nobel prize in 1983
35
wikipedia.org
Transposable Element - Mobile Elements / Jumping genes
• DNA Transposons
• transposase cuts out transposon& inserts it at the target site
• “cut-and-paste” mechanism
• prokaryotes & eukaryotes
• Retrotransposons
• transposon DNA transcribed to RNA
• insertion to genome by reverse transcription
• LTR, LINEs, SINEs
• eukaryotes only
36
What can Bioinformatics do for you?
• sequence analysis
• comparison, annotation, phylogeny
• genomics
• assembly, gene finding / annotation, phylogeny
• data mining / analysis
• text mining, expression profiling (microarray, RNAseq), image analysis
• structural bioinformatic
• 2ndary structure prediction, protein design, docking
37
38
top related