cb1e 20170504 intro2 structure - rostlab · 2017. 5. 4. · cb1_intro2_structure lecture: protein...
TRANSCRIPT
-
/108© Burkhard Rost
1
title: Computational Biology 1 - Protein structure: Intro into protein structure
short title: cb1_intro2_structure
lecture: Protein Prediction 1 - Protein structure Computational Biology 1 - TUM Summer 2016
-
/108© Burkhard Rost
Videos: YouTube / www.rostlab.org THANKS : . EXERCISES: Special lectures: • TBD - Thomas Hopf • TBD - Jonas Reeb No lecture:• 05/10 Student assembly (SVV) • 05/17 Ascension day • 05/26 Whitsun holiday • 06/04 Corpus Christi LAST lecture: June 28 (followed by 2 wrap-up sessions) Examen: June 30, 2016: lecture time room TBD • Makeup: TBC: Oct 18 & 20, 2016 - lecture time
2
CONTACT: Lothar Richter [email protected]
Dmitrij Nechaev
Lothar Richter
http://www.rostlab.orgmailto:[email protected]?subject=
-
/108© Burkhard Rost
3
Questions about last lecture?
??
???
© Wikipedia
-
© Burkhard Rost
Recap
4
-
/108© Burkhard Rost
common to life: DNA/cellsDNA->RNA->proteins = machinery of life
5
Recap: genes - proteins
-
/108© Burkhard Rost
6
Central dogma
slide: Andrea [email protected]
Structure
Function
-
/108© Burkhard Rost
7
Codon wheel
© yourgenome.org slide: Andrea Schafferhans
-
/108© Burkhard Rost
run: 6 hoursfull capacity: ~5-10 TB data / day
8
Illumina: MiSeq
Illumina - Early 2012
-
/108© Burkhard Rost
common to life: DNA/cellsDNA->RNA->proteins = machinery of lifeproteins made up of amino acids (20 different: like pearl chains with pearls of 20 different sizes & “colors”, i.e. biophysical features)20k proteins in human~85M proteins knownprotein length (number of amino acids): 35-30K
9
Recap: genes - proteins
-
/108© Burkhard Rost
10
1ej9 Topoisomerase
1i6h RNA Polymerase
1ais TATA-binding Protein/Transcription Factor IIB
1cgp Catabolite GeneActivator Protein
1tau DNA Polymerase
1lbh+1efa lacRepressor
1aoi Nucleosome
1aoi Nucleosome
1b7t Myosin 1atn Actin
1tub Microtubule1bkv Collagen
© David Goodsell
scale reduced
-
/108© Burkhard Rost
11
Cells: outside & inside
Illustration of Mycoplasma genitalium by David S. Goodsell, the Scripps Research Institute, UCSD, USA
-
/108© Burkhard Rost
Previous lecture• Organisms, genes, central dogma
TODAY: Protein introduction• Amino acids • Protein structure • Bonds & energies • domains • 3D comparisons
NEXT lectures• sequence comparisons/alignments
12
TOC today
-
© Burkhard Rost
Protein Prediction I: Beginners
1 Introduction
1.2 - Proteins/domains
1.3 - 3D comparisons
13
-
© Burkhard Rost
Reality and images
14
-
/108© Burkhard Rost
15
slide: Marco Punta
-
/108© Burkhard Rost
Georges Braque - Houses at L'Estaque
16
slide: Marco Punta
-
/108© Burkhard Rost
17
Where is that?
Illustration by David S. Goodsell, the Scripps Research Institute, UCSD, USA
-
/108© Burkhard Rost
18
Mycoplasma genitalium
Illustration by David S. Goodsell, the Scripps Research Institute, UCSD, USA
-
/108© Burkhard Rost
19
Eukaryotic cell
Illustration by David S. Goodsell, the Scripps Research Institute, UCSD, USA
-
/108© Burkhard Rost
20
slide: Marco Punta
-
/108© Burkhard Rost
Doyle et al. (1998) Science 280:69-77 - The structure of the potassium channel: molecular basis of K+ conduction and selectivity 21
slide: Marco Punta
-
/108© Burkhard Rost
22
http://www.proteopedia.org/wiki/images/7/7b/1htb2.png
Alcohol dehydrogenase (ADH)
PC Sanghani, H Robinson, WF Bosron, TD Hurley (2002) Biochemistry 41:10778-86
http://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Protein_ADH5_PDB_1m6h.png/800px-Protein_ADH5_PDB_1m6h.png
homodimer ADH5
http://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.png
-
/108© Burkhard Rost
23
slide: Marco Punta
-
/108© Burkhard Rost
Umberto Boccioni - Dynamism of a soccer player
24
slide: Marco Punta
-
/108© Burkhard Rost
Photograph: Filippo Monteforte/AFP/Getty Images
25
slide: Marco Punta
Umberto Boccioni - Dynamism of a soccer player
-
/108© Burkhard Rost
26
Different levels of abstraction
Wu et al. unpublished
Photograph: Filippo Monteforte/AFP/Getty Images slide: Marco Punta
Umberto Boccioni - Dynamism of a soccer player
-
© Burkhard Rost
Constituents of proteins: amino
acids
27
-
/108© Burkhard Rost
CH
R
H NH
C O
OH
backbone
side-chain
28
Amino acid
slide: Marco Punta
-
/108© Burkhard Rost
CH
R
H NH
C OH
backbone
side-chain
O
isolated amino acid
29
Joining amino acids into proteins
slide: Marco Punta
-
/108© Burkhard Rost
30
Joining amino acids into proteins
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
a dipeptide
From Wikipedia
slide: Marco Punta
-
/108© Burkhard Rost
31
Joining amino acids into proteins
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
a dipeptide
From Wikipediawww.webchem.net/notes/chemical_bonding/covalent_bonding.htm
slide: Marco Punta
http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
-
/108© Burkhard Rost
32
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
a dipeptide
From Wikipedia
www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
Joining amino acids into proteins
slide: Marco Punta
http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
-
/108© Burkhard Rost
33
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
a dipeptide
From Wikipedia
www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
Joining amino acids into proteins
http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
-
/108© Burkhard Rost
34
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
a dipeptide
From Wikipedia
www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
Joining amino acids into proteins
http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
-
/108© Burkhard Rost
35
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
a dipeptide
From Wikipedia
www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
Joining amino acids into proteins
http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
-
/108© Burkhard Rost
36
Joining amino acids into proteins
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
Obackbone
side-chain
CH
R
NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
N’
C’
polypeptide chain
-
/108© Burkhard Rost
37
Joining amino acids into proteins
CH
R
H NH
C OH
backbone
side-chain
O
-
/108© Burkhard Rost
38
Joining amino acids into proteins
-
/108© Burkhard Rost
39
Joining amino acids into proteins
-
/108© Burkhard Rost
40
Joining amino acids into proteins
-
/108© Burkhard Rost
41
Peptides
Images: https://www2.chemistry.msu.edu/faculty/reusch/virttxtjml/protein2.htm
slide: Andrea Schafferhans
-
© Burkhard Rost
Rationalizing biophysical features
of constituents
42
-
/108© Burkhard Rost
43
Side chain properties
-
/108© Burkhard Rost
44
Side chain properties
+R
+K
-
/108© Burkhard Rost
45
Negatively charged amino acids-
-D
E
-
/108© Burkhard Rost
46
Polar amino acids
+----
++
+N
Q
S
YT
-
/108© Burkhard Rost
47
amino acids
-
© Burkhard Rost
“components” of protein structure:
domains
48
-
/108© Burkhard Rost
49
Domain from introns?
© Wikipedia
RNA splicing
Gene product = protein
-
/108© Burkhard Rost
50
Domain merger
prokaryote P, protein Aprokaryote P, protein B
prokaryote P2, protein C
-
/108© Burkhard Rost
51
3D modules
Multiple 3D alignment identifies consensus secondary structure
© C
hrist
ine O
reng
o
-
/108© Burkhard Rost
52
3D modules
Multiple 3D alignment identifies consensus secondary structure
© C
hrist
ine O
reng
o
-
/108© Burkhard Rost
53
Guessing domains from sequenceprotein Aprotein Bprotein Cprotein Dprotein Eprotein F
domain 1 domain 2
-
/108© Burkhard Rost
54
Most proteins multi-domain
Liu, Hegyi, Acton, Montelione & Rost 2003 Proteins 56:188-200Liu & Rost 2004 Proteins 55:678-686
Single-domain proteins: 61% in PDB 28% in 62 proteomes
-
© Burkhard Rost
3D comparisons: principle idea
how to?
55
-
/108© Burkhard Rost
56
Matching shapes
© http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#/slide-3
http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#
-
/108© Burkhard Rost
57
How to match?
?
-
/108© Burkhard Rost
58
How to match?
?
-
/108© Burkhard Rost
59
Differences for corresponding points
-
/108© Burkhard Rost
60
Differences for corresponding points
-
/108© Burkhard Rost
61
Differences for corresponding points
-
/108© Burkhard Rost
62
Differences for corresponding points
-
/108© Burkhard Rost
63
Differences for corresponding points
12
34
56
87
-
/108© Burkhard Rost
64
Differences for corresponding points
12
34
56
87
Difference
= d1+d2+d3...+d8
= |r1a-r1b|+...+|r8a-r8b|
RMSD (root mean square deviation)
=SQRT [ (r1a-r1b)2+...(r8a-r8b)2 ]
=
!"#$ !,! = !!! − !!!!
!!
-
/108© Burkhard Rost
65
Differences for corresponding points
12
34
56
87
12
34
56
87
!"#$ !,! = !!! − !!!!
!!
-
/108© Burkhard Rost
1st: find corresponding points2nd: superimpose
66
Actual algorithm inverted
!"#$ !,! = !!! − !!!!
!!
-
/108© Burkhard Rost
67
fit now?
-
/108© Burkhard Rost
68
Scaling easy for simple shapes
x^2+y^2=r^2
-
/108© Burkhard Rost
69
Proteins: points are defined->no scaling
12
34
56
87
Doyle et al (1998) Science 280:69-77 - The structure of the potassium channel
-
/108© Burkhard Rost
70
Global vs. local comparisons
-
/108© Burkhard Rost
71
Global vs. local comparisons
-
/108© Burkhard Rost
72
Global vs. local comparisons
global
solution 1:
global
solution 2:
-
/108© Burkhard Rost
73
cut into “units”
-
/108© Burkhard Rost
74
cut into “units”
-
/108© Burkhard Rost
75
trouble: where to stop?
valid “unit”
for comparison?
-
/108© Burkhard Rost
76
How to decide what is a valid unit?
??
???
-
/108© Burkhard Rost
77
valid “unit”
for comparison?
Decision upon validity
-
/108© Burkhard Rost
Scientifically significant: some expert says
78
Valid or not?
-
/108© Burkhard Rost
79
How can a machine decide what is a valid unit?
??
???
-
/108© Burkhard Rost
Scientifically significant: some expert says
Statistically significant: background
80
Valid or not?
background
signal
-
/108© Burkhard Rost
81
Cut, match, compare by RMSD
!"#$ !,! = !!! − !!!!
!!
-
/108© Burkhard Rost
82
Only Cartesian RMSD comparison?
12
34
56
87
12
34
56
87
!"#$ !,! = !!! − !!!!
!!
-
/108© Burkhard Rost
83
2D: difference matrix
12
34
56
87
1 2 3 4 5 6 7 81
2 3 4 5 6 7 8
-
/108© Burkhard Rost
84
Comparison 2D: differences of differences
12
34
56
87
Total of 8 x 8 differences
-
© Burkhard Rost
3D comparisons: biology
85
-
/108© Burkhard Rost
Slides taken from Patrice Koehl, UC Davis
86
Structure alignment
Patrice Koehl
-
/108© Burkhard Rost
1. Identify equivalent positions (residues that match in 3D)2. find superposition independent of domain movements
87
Structure alignment: two steps
Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf© Patrice Koehl, UC Davis
http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf
-
/108© Burkhard Rost
Step 1: find corresponding points in proteins A and Bd(i) are the distances between all corresponding points (typic: Calpha, all atoms)
88
rmsd(A,B) = N1
i=1
N
di2
Root mean square displacement (rmsd)
-
/108© Burkhard Rost
89
RMSD is not a metric
A similar B B similar C NOT implying: A similar C
cRMSD = 2.8 Ǻ = 0.28 nm
A
B C
cRMSD = 2.85 Ǻ = 0.285 nm
-
© Burkhard Rost
SSAP 3D alignment
Taylor & Orengo
90
-
/108© Burkhard Rost
91
SSAP concept
Sik=a
m=-n
m=+n
| di,i+m - dk,k+m | A B + b
WR Taylor & CA Orengo (1989) Protein structure alignment JMB 208:1-22
-
/108© Burkhard Rost
92
SSAP concept
Sik=a
m=-n
m=+n
| di,i+m - dk,k+m | A B + b
WR Taylor & CA Orengo (1989) Protein structure alignment JMB 208:1-22
Problem: loss of information about direction
-
© Burkhard Rost
DALI 3D alignment
Holm & Sander
93
-
/108© Burkhard Rost
94
PQITLWQRPLVTIKIGGQLKEALLDTGADDTVL
PP PQQQYFFQVISSIVRLLSTLWWQEDRKQAKRRRPQPPPPPVVTKFVVLIITTKEKAALIVHYKKFIILVIEENGGGGGTGQQKRRPPLWWVVFKVEESKKVVGLGLLILLLLLVVDDDDDTTTTTGGGGGAAAAADDDDDDDAKESSTTVIIVIVVVIVL
1281757077
120238169200247114740
904
466268
11831
1241
292449726217
102691
140
1109760691481976248590
690
730
415371597395000
5851300
79586900
EEEEE
EEEEEE
EEEEEEE
EE
EEEEE
EEEEEE
EE
kcal/mol0 -1 -2 -3 -4 -5
1 10 20 30 40 50 60 70 80 90
1
10
20
30
40
50
60
70
80
90
1D1D 2D2D 3D3D
Notation: protein structure 1D, 2D, 3D
-
/108© Burkhard Rost
L Holm & C Sander (1993) JMB 233:123-38
Distance matrix AlignmentAlgorithm: Monte Carlo on all-against-all for hexapeptides (5)
95
Structural alignment: DALI
L Holm & C Sander 91993) JMB 233:123-38: Fig. 1
-
© Burkhard Rost
Vorolign 3D alignment
Birzele & Zimmer
96
-
/108© Burkhard Rost
Fabian Birzele, Jan E Gewehr, Gergely Csaba & Ralf Zimmer (2006) Bioinformatics 23:e205-11Dynamic programming on Voronoi environments
97
Structural alignment: VOROLIGN
F Birzele, JE Gewehr, G Csaba & R Zimmer (2006) Bioinformatics 23:e205-11: Fig. 2
-
© Burkhard Rost
3D comparisons: local vs. global
98
-
/108© Burkhard Rost
99
2 forms of calcium-bound Calmodulin
Two forms of calcium-bound Calmodulin:
Ligand free
Complexed with trifluoperazine
Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf
© Patrice Koehl, UC Davis
-
/108© Burkhard Rost
100
Global alignment: RMSD =15 Å /143 residues
Local alignment: RMSD = 0.9 Å/ 62 residues
Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf
© Patrice Koehl, UC Davis
-
© Burkhard Rost
Many other 3D alignment methods
exist
101
-
© Burkhard Rost
Recap: proteins/cells
102
-
/108© Burkhard Rost
103
Cells: outside & inside
Illustration of Mycoplasma genitalium by David S. Goodsell, the Scripps Research Institute, UCSD, USA
-
/108© Burkhard Rost
104
A gallery of proteins
1ej9 Topoisomerase
1i6h RNA Polymerase
1ais TATA-binding Protein/Transcription Factor IIB
1cgp Catabolite GeneActivator Protein
1tau DNA Polymerase
1lbh+1efa lacRepressor
1aoi Nucleosome
1aoi Nucleosome
1b7t Myosin 1atn Actin
1tub Microtubule1bkv Collagen
© David Goodsell
scale reduced
-
/108© Burkhard Rost
105
HIV-1 and a Human T-cell
gp120
CD4
CCR5
HIV-1envelope
glycoprotein
T-cellsurfaceslide: Natasha Wood, Cape Town
-
/108© Burkhard Rost
106
HIV-1 and a Human T-cell
IMAGE:http://www.sciencemag.org/content/320/5877/760/F3.large.jpg
V3 loop
slide: Natasha Wood, Cape Town
-
/108© Burkhard Rost
107
3D modules
Multiple 3D alignment identifies consensus secondary structure
© C
hrist
ine O
reng
o
-
/108© Burkhard Rost
01: 04/25 Tue: no lecture02: 04/27 Thu: no lecture03: 05/02 Tue: Intro 1: organization of lecture: intro into cells & biology04: 05/04 Thu: Intro 2: amino acids, protein structure (comparison), domains05: 05/09 Tue: No lecture06: 05/11 Thu: Alignment 107: 05/16 Tue: Alignment 208: 05/18 Thu: Comparative modeling & experimental structure determination & secondary structure assignment09: 05/23 Tue: SKIP: student assembly (SVV)10: 05/25 Thu: SKIP: Ascension Day11: 05/30 Tue: SKIP: Whitsun holiday (05/15-17) 12: 06/01 Thu: 1D: Secondary structure prediction 113: 06/06 Tue: SKIP: Whitsun holiday (06/03-06)14: 06/08 Thu: 1D: Secondary structure prediction 215: 06/13 Tue: 1D: Secondary structure prediction 3 / Transmembrane structure prediction 116: 06/15 Thu: SKIP: Corpus Christi17: 06/20 Tue: 1D: Transmembrane structure prediction 2 / Solvent accessibility prediction18: 06/22 Thu: 1D: Disorder prediction19: 06/27 Tue: 2D prediction20: 06/29 Thu: 3D prediction / Nobel prize symposium21: 07/04 Tue: TBA22: 07/06 Thu: recap 123: 07/11 Tue: recap 224: 07/12 Thu: examen25: 07/13 Tue: TBA26: 07/18 Thu: TBA27: 07/20 Tue: TBA28: 07/22 Thu: TBA29: 07/25 Thu: TBA30: 07/27 Thu: TBA
108
Lecture plan (CB1 structure)
today