cb1e 20170504 intro2 structure - rostlab · 2017. 5. 4. · cb1_intro2_structure lecture: protein...

108
/108 © Burkhard Rost 1 title: Computational Biology 1 - Protein structure: Intro into protein structure short title: cb1_intro2_structure lecture: Protein Prediction 1 - Protein structure Computational Biology 1 - TUM Summer 2016

Upload: others

Post on 27-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • /108© Burkhard Rost

    1

    title: Computational Biology 1 - Protein structure: 
 Intro into protein structure

    short title: cb1_intro2_structure

    lecture: Protein Prediction 1 - Protein structure
 Computational Biology 1 - TUM Summer 2016

  • /108© Burkhard Rost

    Videos: YouTube / www.rostlab.org 
THANKS :

. EXERCISES: Special lectures: • TBD - Thomas Hopf • TBD - Jonas Reeb No lecture:• 05/10 Student assembly (SVV) • 05/17 Ascension day • 05/26 Whitsun holiday • 06/04 Corpus Christi LAST lecture: June 28 (followed by 2 wrap-up sessions) Examen: June 30, 2016: lecture time room TBD • Makeup: TBC: Oct 18 & 20, 2016 - lecture time

    2

    CONTACT: Lothar Richter [email protected]

    Dmitrij Nechaev

    Lothar Richter

    http://www.rostlab.orgmailto:[email protected]?subject=

  • /108© Burkhard Rost

    3

    Questions about last lecture?

    ??

    ???

    © Wikipedia

  • © Burkhard Rost

    Recap

    4

  • /108© Burkhard Rost

    common to life: DNA/cellsDNA->RNA->proteins = machinery of life

    5

    Recap: genes - proteins

  • /108© Burkhard Rost

    6

    Central dogma

    slide: Andrea [email protected]

    Structure

    Function

  • /108© Burkhard Rost

    7

    Codon wheel

    © yourgenome.org slide: Andrea Schafferhans

  • /108© Burkhard Rost

    run: 6 hoursfull capacity: ~5-10 TB data / day

    8

    Illumina: MiSeq

    Illumina - Early 2012

  • /108© Burkhard Rost

    common to life: DNA/cellsDNA->RNA->proteins = machinery of lifeproteins made up of amino acids (20 different: like pearl chains with pearls of 20 different sizes & “colors”, i.e. biophysical features)20k proteins in human~85M proteins knownprotein length (number of amino acids): 35-30K

    9

    Recap: genes - proteins

  • /108© Burkhard Rost

    10

    1ej9 Topoisomerase

    1i6h RNA Polymerase

    1ais TATA-binding Protein/Transcription Factor IIB

    1cgp Catabolite GeneActivator Protein

    1tau DNA Polymerase

    1lbh+1efa lacRepressor

    1aoi Nucleosome

    1aoi Nucleosome

    1b7t Myosin 1atn Actin

    1tub Microtubule1bkv Collagen

    © David Goodsell

    scale reduced

  • /108© Burkhard Rost

    11

    Cells: outside & inside

    Illustration of Mycoplasma genitalium by David S. Goodsell, the Scripps Research Institute, UCSD, USA

  • /108© Burkhard Rost

    Previous lecture• Organisms, genes, central dogma

    TODAY: Protein introduction• Amino acids • Protein structure • Bonds & energies • domains • 3D comparisons

    NEXT lectures• sequence comparisons/alignments

    12

    TOC today

  • © Burkhard Rost

    Protein Prediction I: Beginners

    1 Introduction

    1.2 - Proteins/domains

    1.3 - 3D comparisons

    13

  • © Burkhard Rost

    Reality and images

    14

  • /108© Burkhard Rost

    15

    slide: Marco Punta

  • /108© Burkhard Rost

    Georges Braque - Houses at L'Estaque

    16

    slide: Marco Punta

  • /108© Burkhard Rost

    17

    Where is that?

    Illustration by David S. Goodsell, the Scripps Research Institute, UCSD, USA

  • /108© Burkhard Rost

    18

    Mycoplasma genitalium

    Illustration by David S. Goodsell, the Scripps Research Institute, UCSD, USA

  • /108© Burkhard Rost

    19

    Eukaryotic cell

    Illustration by David S. Goodsell, the Scripps Research Institute, UCSD, USA

  • /108© Burkhard Rost

    20

    slide: Marco Punta

  • /108© Burkhard Rost

    Doyle et al. (1998) Science 280:69-77 - The structure of the potassium channel: molecular basis of K+ conduction and selectivity 21

    slide: Marco Punta

  • /108© Burkhard Rost

    22

    http://www.proteopedia.org/wiki/images/7/7b/1htb2.png

    Alcohol dehydrogenase (ADH)

    PC Sanghani, H Robinson, WF Bosron, TD Hurley (2002) Biochemistry 41:10778-86

    http://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Protein_ADH5_PDB_1m6h.png/800px-Protein_ADH5_PDB_1m6h.png

    homodimer ADH5

    http://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.png

  • /108© Burkhard Rost

    23

    slide: Marco Punta

  • /108© Burkhard Rost

    Umberto Boccioni - Dynamism of a soccer player

    24

    slide: Marco Punta

  • /108© Burkhard Rost

    Photograph: Filippo Monteforte/AFP/Getty Images

    25

    slide: Marco Punta

    Umberto Boccioni - Dynamism of a soccer player

  • /108© Burkhard Rost

    26

    Different levels of abstraction

    Wu et al. unpublished

    Photograph: Filippo Monteforte/AFP/Getty Images slide: Marco Punta

    Umberto Boccioni - Dynamism of a soccer player

  • © Burkhard Rost

    Constituents of proteins: amino

    acids

    27

  • /108© Burkhard Rost

    CH

    R

    H NH

    C O

    OH

    backbone

    side-chain

    28

    Amino acid

    slide: Marco Punta

  • /108© Burkhard Rost

    CH

    R

    H NH

    C OH

    backbone

    side-chain

    O

    isolated amino acid

    29

    Joining amino acids into proteins

    slide: Marco Punta

  • /108© Burkhard Rost

    30

    Joining amino acids into proteins

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    a dipeptide

    From Wikipedia

    slide: Marco Punta

  • /108© Burkhard Rost

    31

    Joining amino acids into proteins

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    a dipeptide

    From Wikipediawww.webchem.net/notes/chemical_bonding/covalent_bonding.htm

    slide: Marco Punta

    http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

  • /108© Burkhard Rost

    32

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    a dipeptide

    From Wikipedia

    www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

    Joining amino acids into proteins

    slide: Marco Punta

    http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

  • /108© Burkhard Rost

    33

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    a dipeptide

    From Wikipedia

    www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

    Joining amino acids into proteins

    http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

  • /108© Burkhard Rost

    34

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    a dipeptide

    From Wikipedia

    www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

    Joining amino acids into proteins

    http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

  • /108© Burkhard Rost

    35

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    a dipeptide

    From Wikipedia

    www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

    Joining amino acids into proteins

    http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm

  • /108© Burkhard Rost

    36

    Joining amino acids into proteins

    CH

    R

    H NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    Obackbone

    side-chain

    CH

    R

    NH

    C O

    backbone

    side-chain

    HC

    R

    HN C

    OH

    Obackbone

    side-chain

    N’

    C’

    polypeptide chain

  • /108© Burkhard Rost

    37

    Joining amino acids into proteins

    CH

    R

    H NH

    C OH

    backbone

    side-chain

    O

  • /108© Burkhard Rost

    38

    Joining amino acids into proteins

  • /108© Burkhard Rost

    39

    Joining amino acids into proteins

  • /108© Burkhard Rost

    40

    Joining amino acids into proteins

  • /108© Burkhard Rost

    41

    Peptides

    Images: https://www2.chemistry.msu.edu/faculty/reusch/virttxtjml/protein2.htm

    slide: Andrea Schafferhans

  • © Burkhard Rost

    Rationalizing biophysical features

    of constituents

    42

  • /108© Burkhard Rost

    43

    Side chain properties

  • /108© Burkhard Rost

    44

    Side chain properties

    +R

    +K

  • /108© Burkhard Rost

    45

    Negatively charged amino acids-

    -D

    E

  • /108© Burkhard Rost

    46

    Polar amino acids

    +----

    ++

    +N

    Q

    S

    YT

  • /108© Burkhard Rost

    47

    amino acids

  • © Burkhard Rost

    “components” of protein structure:

    domains

    48

  • /108© Burkhard Rost

    49

    Domain from introns?

    © Wikipedia

    RNA splicing

    Gene product = protein

  • /108© Burkhard Rost

    50

    Domain merger

    prokaryote P, protein Aprokaryote P, protein B

    prokaryote P2, protein C

  • /108© Burkhard Rost

    51

    3D modules

    Multiple 3D alignment identifies consensus secondary structure

    © C

    hrist

    ine O

    reng

    o

  • /108© Burkhard Rost

    52

    3D modules

    Multiple 3D alignment identifies consensus secondary structure

    © C

    hrist

    ine O

    reng

    o

  • /108© Burkhard Rost

    53

    Guessing domains from sequenceprotein Aprotein Bprotein Cprotein Dprotein Eprotein F

    domain 1 domain 2

  • /108© Burkhard Rost

    54

    Most proteins multi-domain

    Liu, Hegyi, Acton, Montelione & Rost 2003 Proteins 56:188-200Liu & Rost 2004 Proteins 55:678-686

    Single-domain proteins: 61% in PDB 28% in 62 proteomes

  • © Burkhard Rost

    3D comparisons: principle idea

    how to?

    55

  • /108© Burkhard Rost

    56

    Matching shapes

    © http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#/slide-3

    http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#

  • /108© Burkhard Rost

    57

    How to match?

    ?

  • /108© Burkhard Rost

    58

    How to match?

    ?

  • /108© Burkhard Rost

    59

    Differences for corresponding points

  • /108© Burkhard Rost

    60

    Differences for corresponding points

  • /108© Burkhard Rost

    61

    Differences for corresponding points

  • /108© Burkhard Rost

    62

    Differences for corresponding points

  • /108© Burkhard Rost

    63

    Differences for corresponding points

    12

    34

    56

    87

  • /108© Burkhard Rost

    64

    Differences for corresponding points

    12

    34

    56

    87

    Difference

    = d1+d2+d3...+d8

    = |r1a-r1b|+...+|r8a-r8b|

    RMSD (root mean square deviation)

    =SQRT [ (r1a-r1b)2+...(r8a-r8b)2 ]

    =

    !"#$ !,! = !!! − !!!!

    !!

  • /108© Burkhard Rost

    65

    Differences for corresponding points

    12

    34

    56

    87

    12

    34

    56

    87

    !"#$ !,! = !!! − !!!!

    !!

  • /108© Burkhard Rost

    1st: find corresponding points2nd: superimpose

    66

    Actual algorithm inverted

    !"#$ !,! = !!! − !!!!

    !!

  • /108© Burkhard Rost

    67

    fit now?

  • /108© Burkhard Rost

    68

    Scaling easy for simple shapes

    x^2+y^2=r^2

  • /108© Burkhard Rost

    69

    Proteins: points are defined->no scaling

    12

    34

    56

    87

    Doyle et al (1998) Science 280:69-77 - The structure of the potassium channel

  • /108© Burkhard Rost

    70

    Global vs. local comparisons

  • /108© Burkhard Rost

    71

    Global vs. local comparisons

  • /108© Burkhard Rost

    72

    Global vs. local comparisons

    global

    solution 1:

    global

    solution 2:

  • /108© Burkhard Rost

    73

    cut into “units”

  • /108© Burkhard Rost

    74

    cut into “units”

  • /108© Burkhard Rost

    75

    trouble: where to stop?

    valid “unit”

    for comparison?

  • /108© Burkhard Rost

    76

    How to decide what is a valid unit?

    ??

    ???

  • /108© Burkhard Rost

    77

    valid “unit”

    for comparison?

    Decision upon validity

  • /108© Burkhard Rost

    Scientifically significant:
some expert says

    78

    Valid or not?

  • /108© Burkhard Rost

    79

    How can a machine decide what is a valid unit?

    ??

    ???

  • /108© Burkhard Rost

    Scientifically significant:
some expert says 





    Statistically significant:
background

    80

    Valid or not?

    background

    signal

  • /108© Burkhard Rost

    81

    Cut, match, compare by RMSD

    !"#$ !,! = !!! − !!!!

    !!

  • /108© Burkhard Rost

    82

    Only Cartesian RMSD comparison?

    12

    34

    56

    87

    12

    34

    56

    87

    !"#$ !,! = !!! − !!!!

    !!

  • /108© Burkhard Rost

    83

    2D: difference matrix

    12

    34

    56

    87

    1 2 3 4 5 6 7 81

    2 3 4 5 6 7 8

  • /108© Burkhard Rost

    84

    Comparison 2D: differences of differences

    12

    34

    56

    87

    Total of 8 x 8 differences

  • © Burkhard Rost

    3D comparisons: biology

    85

  • /108© Burkhard Rost

    Slides taken from Patrice Koehl, UC Davis

    86

    Structure alignment

    Patrice Koehl

  • /108© Burkhard Rost

    1. Identify equivalent positions (residues that match in 3D)2. find superposition independent of domain movements

    87

    Structure alignment: two steps

    Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf© Patrice Koehl, UC Davis

    http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf

  • /108© Burkhard Rost

    Step 1: find corresponding points in proteins A and Bd(i) are the distances between all corresponding points (typic: Calpha, all atoms)

    88

    rmsd(A,B) = N1

    i=1

    N

    di2

    Root mean square displacement (rmsd)

  • /108© Burkhard Rost

    89

    RMSD is not a metric

    A similar B B similar C NOT implying: A similar C

    cRMSD = 2.8 Ǻ = 0.28 nm

    A

    B C

    cRMSD = 2.85 Ǻ = 0.285 nm

  • © Burkhard Rost

    SSAP 3D alignment

    Taylor & Orengo

    90

  • /108© Burkhard Rost

    91

    SSAP concept

    Sik=a

    m=-n

    m=+n

    | di,i+m - dk,k+m | A B + b

    WR Taylor & CA Orengo (1989) Protein structure alignment JMB 208:1-22

  • /108© Burkhard Rost

    92

    SSAP concept

    Sik=a

    m=-n

    m=+n

    | di,i+m - dk,k+m | A B + b

    WR Taylor & CA Orengo (1989) Protein structure alignment JMB 208:1-22

    Problem: loss of information about direction

  • © Burkhard Rost

    DALI 3D alignment

    Holm & Sander

    93

  • /108© Burkhard Rost

    94

    PQITLWQRPLVTIKIGGQLKEALLDTGADDTVL

    PP PQQQYFFQVISSIVRLLSTLWWQEDRKQAKRRRPQPPPPPVVTKFVVLIITTKEKAALIVHYKKFIILVIEENGGGGGTGQQKRRPPLWWVVFKVEESKKVVGLGLLILLLLLVVDDDDDTTTTTGGGGGAAAAADDDDDDDAKESSTTVIIVIVVVIVL

    1281757077

    120238169200247114740

    904

    466268

    11831

    1241

    292449726217

    102691

    140

    1109760691481976248590

    690

    730

    415371597395000

    5851300

    79586900

    EEEEE

    EEEEEE

    EEEEEEE

    EE

    EEEEE

    EEEEEE

    EE

    kcal/mol0 -1 -2 -3 -4 -5

    1 10 20 30 40 50 60 70 80 90

    1

    10

    20

    30

    40

    50

    60

    70

    80

    90

    1D1D 2D2D 3D3D

    Notation: protein structure 1D, 2D, 3D

  • /108© Burkhard Rost

    L Holm & C Sander (1993) JMB 233:123-38

    Distance matrix AlignmentAlgorithm: Monte Carlo on all-against-all for hexapeptides (5)

    95

    Structural alignment: DALI

    L Holm & C Sander 91993) JMB 233:123-38: Fig. 1

  • © Burkhard Rost

    Vorolign 3D alignment

    Birzele & Zimmer

    96

  • /108© Burkhard Rost

    Fabian Birzele, Jan E Gewehr, Gergely Csaba & Ralf Zimmer (2006) Bioinformatics 23:e205-11Dynamic programming on Voronoi environments

    97

    Structural alignment: VOROLIGN

    F Birzele, JE Gewehr, G Csaba & R Zimmer (2006) Bioinformatics 23:e205-11: Fig. 2

  • © Burkhard Rost

    3D comparisons: local vs. global

    98

  • /108© Burkhard Rost

    99

    2 forms of calcium-bound Calmodulin

    Two forms of calcium-bound Calmodulin:

    Ligand free

    Complexed with trifluoperazine

    Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf

    © Patrice Koehl, UC Davis

  • /108© Burkhard Rost

    100

    Global alignment: RMSD =15 Å /143 residues

    Local alignment: RMSD = 0.9 Å/ 62 residues

    Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf

    © Patrice Koehl, UC Davis

  • © Burkhard Rost

    Many other 3D alignment methods

    exist

    101

  • © Burkhard Rost

    Recap: proteins/cells

    102

  • /108© Burkhard Rost

    103

    Cells: outside & inside

    Illustration of Mycoplasma genitalium by David S. Goodsell, the Scripps Research Institute, UCSD, USA

  • /108© Burkhard Rost

    104

    A gallery of proteins

    1ej9 Topoisomerase

    1i6h RNA Polymerase

    1ais TATA-binding Protein/Transcription Factor IIB

    1cgp Catabolite GeneActivator Protein

    1tau DNA Polymerase

    1lbh+1efa lacRepressor

    1aoi Nucleosome

    1aoi Nucleosome

    1b7t Myosin 1atn Actin

    1tub Microtubule1bkv Collagen

    © David Goodsell

    scale reduced

  • /108© Burkhard Rost

    105

    HIV-1 and a Human T-cell

    gp120

    CD4

    CCR5

    HIV-1envelope

    glycoprotein

    T-cellsurfaceslide: Natasha Wood, Cape Town

  • /108© Burkhard Rost

    106

    HIV-1 and a Human T-cell

    IMAGE:http://www.sciencemag.org/content/320/5877/760/F3.large.jpg

    V3 loop

    slide: Natasha Wood, Cape Town

  • /108© Burkhard Rost

    107

    3D modules

    Multiple 3D alignment identifies consensus secondary structure

    © C

    hrist

    ine O

    reng

    o

  • /108© Burkhard Rost

    01: 04/25 Tue: no lecture02: 04/27 Thu: no lecture03: 05/02 Tue: Intro 1: organization of lecture: intro into cells & biology04: 05/04 Thu: Intro 2: amino acids, protein structure (comparison), domains05: 05/09 Tue: No lecture06: 05/11 Thu: Alignment 107: 05/16 Tue: Alignment 208: 05/18 Thu: Comparative modeling & experimental structure determination & secondary structure assignment09: 05/23 Tue: SKIP: student assembly (SVV)10: 05/25 Thu: SKIP: Ascension Day11: 05/30 Tue: SKIP: Whitsun holiday (05/15-17) 12: 06/01 Thu: 1D: Secondary structure prediction 113: 06/06 Tue: SKIP: Whitsun holiday (06/03-06)14: 06/08 Thu: 1D: Secondary structure prediction 215: 06/13 Tue: 1D: Secondary structure prediction 3 / Transmembrane structure prediction 116: 06/15 Thu: SKIP: Corpus Christi17: 06/20 Tue: 1D: Transmembrane structure prediction 2 / Solvent accessibility prediction18: 06/22 Thu: 1D: Disorder prediction19: 06/27 Tue: 2D prediction20: 06/29 Thu: 3D prediction / Nobel prize symposium21: 07/04 Tue: TBA22: 07/06 Thu: recap 123: 07/11 Tue: recap 224: 07/12 Thu: examen25: 07/13 Tue: TBA26: 07/18 Thu: TBA27: 07/20 Tue: TBA28: 07/22 Thu: TBA29: 07/25 Thu: TBA30: 07/27 Thu: TBA

    108

    Lecture plan (CB1 structure)

    today