bioc3010: bioinformatics - revision lecture dr. andrew c.r. martin [email protected]
TRANSCRIPT
BIOC3010: Bioinformatics - Revision lecture
Dr. Andrew C.R. Martin
http://www.bioinf.org.uk/
Data Creation
Analysis
Prediction
Presentation
Searching
OrganizingSequences
DNAProtein
ComputersStructures
Introductionary Lecture
Introduction
• helps you create data
• example of fragment assembly
Bioinformatics…
Introduction
• provides tools to store and search data
• databases and databanks• primary/secondary/composite/gateways
Bioinformatics…
Introduction
• allows you to make predictions
• prediction techniques– moving windows, – computer learning
Bioinformatics…
Introduction
• allows you to create 3D models
• separate lecture
Bioinformatics…
Introduction
• allows transfer of annotations
• homologous proteins likely to perform similar functions
Bioinformatics…
IntroductionAnnotations…
• Pre-genome world• Post-genome world
• Annotations will change
Genomes and Gene Prediction Lectures
Genome structure• C-paradox
• Compare prokaryotes and eukaryotes
• Complexity of eukaryotes:– Introns/exons,– Repeated sequences,– Transposable elements,– Pseudogenes
• Problems introduced by these...
ORF Scanning in Eukaryotes
exon intron exon5’ 3’
Intron/exonsplice sites
Finding Genes in Genomic DNA
Ab initio methods Similarity based methods
Integrated approaches
30 40
TRY4
Prediction accuracy
• Nucleotide level• Exon level
• Measures for assessment
Computing Lecture
Computers
Operating systems
• What is an operating system?• Examples of operating systems• Choice of operating systems for different areas of
research
Computers and computer science
• Data structures and information retrieval
– Relational databases– Design of databases to reduce errors in data
• Simple examples of SQL and structuring data into tables
Must handle:
Computers and computer science
• Algorithms: how to solve a problem
– Defined an algorithm– Looked at an example
Must handle:
Computers and computer science
• Data mining and machine learning
– Extract patterns, etc from data– Computer software which learns from examples and
is then able to make predictions
Must handle:
Comparative Modelling Lecture
What is comparative modelling?
• Build a three-dimensional (3D) model of a protein...
• …based on known structure of a (generally) homologous protein sequence
• "Homology Modelling" is misleading:– fold recognition and threading allow recognition of
non-homologous sequences which adopt the same fold
Stages in CM
1. Identify templates (or ‘parents’)
2. Align the target sequence with the parent(s),
3. Find:structurally conserved regionsstructurally variable regions
4. Inherit the SCRs from the parent(s)
5. Build the SVRs
6. Build the sidechains
7. Refine the model
8. Evaluate errors in the model
Correct alignment is the structural alignment.
Align target with parent(s)
Structure ofTarget
Optimal alignmentbased on
Structural Equivalents
Structure ofParent
We don’t have this!
Guess structural alignment
from sequence alignment
An example MLSA
Sequence alignment quality
Assessing the model
• Ideal is to compare the model with the true target structure - 4-6Å; 2Å; 0.5Å
NidRMS
2
Model quality
The main factors are:
The sequence identity with the primary parent The number and size of indels The quality of the alignment The amount of change which has been necessary to the
parent(s) to create the model.
Summary of CASP2results
CASP8 ransummer 2008
http://predictioncenter.gc.ucdavis.edu
Medical Applications Lecture
Mutations, Alleles & Polymorphisms
• Mutation:– any change in DNA sequence
• Allele:– alternative form of a genetic locus; one inherited from
each parent– e.g. eye colour locus - brown and blue alleles
• Polymorphism:– genetic variation present in >= 1% of a normal
population
How are SNPs useful?
• Understanding evolution– Some alleles may be advantageous in one
environment, but disadvantageous in another
• DNA fingerprinting• Markers to map traits
– diseases, characteristics
• Pharmacogenomics– genotype-specific medications
Drug responses
Drug efficacy may be affected by:
• transporters• metabolism• receptors• signalling pathways, etc.
Potentially lethal SNPs
First described ~2000 years ago
“What is food to some men may be fierce poison toothers”
Lucretius Caro
Protein Sequence
DNA Sequence
Protein Structure Protein Function
Mutation
Altered Sequence
Altered Structure
AlteredFunction
UnderstandStructure &
Function
Restore Structure
RestoredFunction
DesignDrugs
• Looked at p53...
• Local level - effects of mutations
• General classes– Functional– Fold Preventing– Destabilizing
Types of mutations
How human?
Chimeric: 67% human
Humanized: 90% human
Mouse: 0% human
Antibody Humanization
Summary
– Diagnosis of disease– Prediction of disease risk– Prognosis– Customized response to disease– Identifying drug targets - treatments– Engineering of proteins for therapy
Docking and Drug Design Lecture
Van der Waals forcesElectrostatic (Salt bridge) InteractionHydrogen bondsHydrophobic bonding
+ + -+ +
Surface complementarity
-+ + + ++
Six degrees of freedom- protein and ligand both treated as rigid- 3 rotations / 3 translations
Docking methods - rigid body
Just like docking the space shuttle with a satellite
Image from NASA
Treat receptor as static / ligand as flexible
Dock ligand into binding pocket- generate large number of possible orientations
Evaluate and select by energy function
Docking methods - flexible ligand
Ligand Matching
• Match sphere centres against ligand atoms• Find possible ligand orientations• Often >10,000 orientations possible
Find the transformation (rotation + translation) to maximize sphere matching
DOCK
Virtual Screening
• Docking can be used for virtual screening
• Scan a library of potential drug molecules• Identify leads
LUDI (InsightII) - find fragments that can bind
GRID - uses molecular mechanics potential to find interaction sites for probe groups
X-site - uses an empirical potential to find interaction sites for probe groups
De Novo Drug Design
Stupid mistakes...
• Don't confuse secondary databases with secondary structure!
• Ensure you know the difference between SCOP/PFam functional domains and CATH structural domains
Summary
• Find pockets• Principles for docking - complementarity• Docking
– rigid body / ligand flexibility
• Virtual screening• Identifying probe interaction sites
– build ligands de novo