11/04/05 d dobbs isu - bcb 444/544x: protein structure & function1 11/4/05 protein structure...
TRANSCRIPT
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 1
11/4/05
Protein Structure & Function
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 2
Announcements
Exam 2 - Has been graded - Will be returned at end of class today
Grade statistics – 444 Average = 81/100 544 Average = 100/118
Questions?
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 3
AnnouncementsBCB 544 Projects - Important Dates:
Nov 2 Wed noon - Project proposals due to David/Drena
Nov 4 Fri PM - Approvals/responses & tentative presentation schedule to
students
Dec 2 Fri noon - Written project reports due
Dec 5,7,8,9 class/lab - Oral Presentations (20')
(Dec 15 Thurs = Final Exam)
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 4
Bioinformatics Seminars
Nov 4 Fri 12:10 PM BCB Faculty Seminar in E164 Lago How to do sequence alignments on parallel computers Srinivas Aluru, ECprE & Chair, BCB Program http://www.bcb.iastate.edu/courses/BCB691F2005.html
Next week:
Nov 10 Thurs 3:40 PM ComS Seminar in 223 Atanasoff
Computational Epidemiology Armin R. Mikler, Univ. North Texas
http://www.cs.iastate.edu/~colloq/#t3
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 5
Bioinformatics Seminars
CORRECTION:
Week after next - Baker Center/BCB Seminars: (seminar abstracts available at above link)
Nov 14 Mon 1:10 PM Doug Brutlag, StanfordDiscovering transcription factor binding
sites
Nov 15 Tues 1:10 PM Ilya Vakser, Univ Kansas Modeling protein-protein interactions both seminars will be in Howe Hall Auditorium
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 6
RNA Structure & Function/Prediction
Protein Structure & FunctionMon Review - promoter prediction
RNA structure & function
Wed RNA structure prediction 2' & 3' structure prediction miRNA & target prediction - Lab
10
Fri - a few more words re: Algorithms
Protein structure & function
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 7
Reading Assignment (for Fri/Mon)
Mount Bioinformatics• Chp 10 Protein classification & structure prediction
http://www.bioinformaticsonline.org/ch/ch10/index.html
• pp. 409-491 • Ck Errata:
http://www.bioinformaticsonline.org/help/errata2.html
Other? That should be plenty…
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 8
Review last lecture:
RNA Structure Prediction
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 9
miRNA and RNAi pathways
RISC
Dicerprecursor
miRNA siRNAs
Dicer
“translational repression”and/or mRNA degradation mRNA cleavage, degradation
RNAi pathway
microRNA pathwayMicroRNA primary transcript Exogenous dsRNA, transposon,
etc.
target mRNA
Drosha
RISCRISC
C Burge 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 10
miRNA Challenges for Computational Biology
• Find the genes encoding microRNAs
• Predict their regulatory targets
• Integrate miRNAs into gene regulatory pathways & networks
Computational Prediction of MicroRNA Genes & Targets
C Burge 2005
Need to modify traditional paradigm of "transcriptional control" primarily by protein-DNA interactions to include miRNA regulatory mechanisms!
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 11
RNA structure prediction strategies
1)Energy minimization (thermodynamics)
2) Comparative sequence analysis (co-variation)
3) Combined experimental & computational
Secondary structure prediction
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 12
Secondary structure prediction strategies
1)Energy minimization (thermodynamics)
• Algorithm: Dynamic programming to find high probability pairs(also, some genetic algorithms)
• Software:Mfold - ZukerVienna RNA Package -
Hofacker RNAstructure - MathewsSfold - Ding & Lawrence
R Knight 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 13
Secondary structure prediction strategies
2) Comparative sequence analysis (co-variation)• Algorithms:
Mutual informationStochastic context-free grammars
• Software: ConStructAlifoldPfoldFOLDALIGNDynalign
R Knight 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 14
Secondary structure prediction strategies
3) Combined experimental & computational
• Experiment:Map single-stranded vs double-stranded regions in folded RNA
• How?Enzymes: S1 nuclease, T1 RNaseChemicals: kethoxal, DMS
R Knight 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 15
Experimental RNA structure determination?
• X-ray crystallography
• NMR spectroscopy
• Enzymatic/chemical mapping
• Molecular genetic analyses
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 16
1) Energy minimization method
What are the assumptions?
Native tertiary structure or "fold" of an RNA molecule is (one of) its "lowest" free energy configuration(s)
Gibbs free energy = G in kcal/mol at 37C
= equilibrium stability of structure lower values (negative) are more favorableIs this assumption valid?
in vivo? - this may not hold, but we don't really know
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 17
Free energy minimization
What are the rules?
A UA U
A=UA=U
Basepair
G = -1.2 kcal/mole
A UU A
A=UU=A
G = -1.6 kcal/mole
Basepair
What gives here?
C Staben 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 18
Energy minimization calculations:Base-stacking is critical
AAUU
-1.2 CGGC
-3.0
AU or UAUA AU
-1.6 GCCG
-4.3
AG, AC, CA, GAUC, UG, GU, CU
-2.1 GUUG
-0.3
CCGG
-4.8 XG, GXYU, UY
0
- Tinocco et al.
C Staben 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 19
Nearest-neighbor parameters
Most methods for free energy minimizationuse nearest-neighbor parameters (derived from experiment) for predicting stability of an RNA secondary structure (in terms of G at 37C)
& most available software packages use the same set of parameters: Mathews, Sabina, Zuker & Turner,
1999
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 20
Energy minimization - calculations:
Total free energy of a specific conformation for a specific RNA molecule = sum of incremental energy terms for:
• helical stacking (sequence dependent)• loop initiation• unpaired stacking
(favorable "increments" are < 0)
Fig 6.3Baxevanis & Ouellette 2005
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 21
But how many possible conformations for a single RNA molecule?
Huge number:Zuker estimates (1.8)N possible secondary
structures for a sequence of N nucleotides
for 100 nts (small RNA…) = 3 X 1025 structures!
Solution? Not exhaustive enumeration… Dynamic programming
O(N3) in time O(N2) in
space/storage iff pseudoknots
excluded, otherwise:O(N6 ), timeO(N4 ), space
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 22
Algorithms based on energy minimization
For outline of algorithm used in Mfold, including description of dynamic programming recursion, please visit Michael Zuker's lecture: http://www.bioinfo.rpi.edu/~zukerm/lectures/RNAfold-html
From this site, you may also download his lecture as either PDF or PS file.
Hmmm, something based on this might make an interesting "Final Exam" question: how could one apply dynamic programming approaches learned in first half of course to RNA structure prediction problem?
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 23
2) Comparative sequence analysis (co-variation)
Two basic approaches:
• Algorithms constrained by initial alignment
Much faster, but not as robust as unconstrained
Base-pairing probabilities determined by a partition function
• Algorithms not constrained by initial alignment
Genetic algorithms often used for finding an alignment & set of structures
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 24
RNA Secondary structure prediction: Performance?
How evaluate? • Not many experimentally determined structures
currently, ~ 50% are rRNA structures so "Gold Standard" (in absence of tertiary structure):
compare with predicted RNA secondary structure with that determined by
comparative sequence analysis (!!??) using Benchmark Datasets
NOTE: Base-pairs predicted by comparative sequence analysis for large & small subunit rRNAs are 97% accurate when compared with high resolution crystal structures!
- Gutell, Pace
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 25
RNA Secondary structure prediction: Performance?
1)Energy minimization (via dynamic programming) 73% avg. prediction accuracy - single
sequence2) Comparative sequence analysis
97% avg. prediction accuracy - multiple sequences (e.g., highly conserved rRNAs)much lower if sequence conservation is lower &/or fewer sequences are available for alignment
3) Combined - recent developments: combine thermodynamics & co-variation
& experimental constraints? IMPROVED RESULTS
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 26
RNA structure prediction strategies
Requires "craft" & significant user input & insight
1) Extensive comparative sequence analysis to predict tertiary contacts (co-variation)
e.g., MANIP - Westhof2) Use experimental data to constrain model
building e.g., MC-CYM - Major
3) Homology modeling using sequence alignment & reference tertiary structure (not many of these!)
4) Low resolution molecular mechanics e.g., yammp - Harvey
Tertiary structure prediction
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 27
New Today:
Protein Structure & Function
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 28
Protein Structure & Function
Protein structure - primarily determined by sequence
Protein function - primarily determined by structure
• Globular proteins: compact hydrophobic core & hydrophilic surface
• Membrane proteins: special hydrophobic surfaces
• Folded proteins are only marginally stable• Some proteins do not assume a stable "fold"
until they bind to something = Intrinsically disordered
Predicting protein structure and function can be very hard -- & fun!
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 29
4 Basic Levels of Protein Structure
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 30
Primary & Secondary Structure
Primary • Linear sequence of amino acids• Description of covalent bonds linking aa’s
Secondary • Local spatial arrangement of amino acids• Description of short-range non-covalent
interactions• Periodic structural patterns: -helix, -sheet
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 31
Tertiary & Quaternary Structure
Tertiary • Overall 3-D "fold" of a single polypeptide chain• Spatial arrangement of 2’ structural elements;
packing of these into compact "domains"• Description of long-range non-covalent
interactions (plus disulfide bonds)
Quaternary• In proteins with > 1 polypeptide chain, spatial
arrangement of subunits
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function 32
"Additional" Structural Levels
• Super-secondary elements
• Motifs• Domains• Foldons