bioc3010: bioinformatics - revision lecture dr. andrew c.r. martin [email protected]

49
BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin [email protected] http://www.bioinf.org.uk/

Upload: joel-gray

Post on 26-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

BIOC3010: Bioinformatics - Revision lecture

Dr. Andrew C.R. Martin

[email protected]

http://www.bioinf.org.uk/

Page 2: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Data Creation

Analysis

Prediction

Presentation

Searching

OrganizingSequences

DNAProtein

ComputersStructures

Page 3: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Introductionary Lecture

Page 4: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Introduction

• helps you create data

• example of fragment assembly

Bioinformatics…

Page 5: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Introduction

• provides tools to store and search data

• databases and databanks• primary/secondary/composite/gateways

Bioinformatics…

Page 6: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Introduction

• allows you to make predictions

• prediction techniques– moving windows, – computer learning

Bioinformatics…

Page 7: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Introduction

• allows you to create 3D models

• separate lecture

Bioinformatics…

Page 8: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Introduction

• allows transfer of annotations

• homologous proteins likely to perform similar functions

Bioinformatics…

Page 9: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

IntroductionAnnotations…

• Pre-genome world• Post-genome world

• Annotations will change

Page 10: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Genomes and Gene Prediction Lectures

Page 11: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Genome structure• C-paradox

• Compare prokaryotes and eukaryotes

• Complexity of eukaryotes:– Introns/exons,– Repeated sequences,– Transposable elements,– Pseudogenes

• Problems introduced by these...

Page 12: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

ORF Scanning in Eukaryotes

exon intron exon5’ 3’

Intron/exonsplice sites

Page 13: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Finding Genes in Genomic DNA

Ab initio methods Similarity based methods

Integrated approaches

30 40

TRY4

Page 14: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Prediction accuracy

• Nucleotide level• Exon level

• Measures for assessment

Page 15: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Computing Lecture

Page 16: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Computers

Page 17: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Operating systems

• What is an operating system?• Examples of operating systems• Choice of operating systems for different areas of

research

Page 18: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Computers and computer science

• Data structures and information retrieval

– Relational databases– Design of databases to reduce errors in data

• Simple examples of SQL and structuring data into tables

Must handle:

Page 19: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Computers and computer science

• Algorithms: how to solve a problem

– Defined an algorithm– Looked at an example

Must handle:

Page 20: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Computers and computer science

• Data mining and machine learning

– Extract patterns, etc from data– Computer software which learns from examples and

is then able to make predictions

Must handle:

Page 21: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Comparative Modelling Lecture

Page 22: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

What is comparative modelling?

• Build a three-dimensional (3D) model of a protein...

• …based on known structure of a (generally) homologous protein sequence

• "Homology Modelling" is misleading:– fold recognition and threading allow recognition of

non-homologous sequences which adopt the same fold

Page 23: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Stages in CM

1. Identify templates (or ‘parents’)

2. Align the target sequence with the parent(s),

3. Find:structurally conserved regionsstructurally variable regions

4. Inherit the SCRs from the parent(s)

5. Build the SVRs

6. Build the sidechains

7. Refine the model

8. Evaluate errors in the model

Page 24: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Correct alignment is the structural alignment.

Align target with parent(s)

Structure ofTarget

Optimal alignmentbased on

Structural Equivalents

Structure ofParent

We don’t have this!

Guess structural alignment

from sequence alignment

Page 25: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

An example MLSA

Page 26: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Sequence alignment quality

Page 27: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Assessing the model

• Ideal is to compare the model with the true target structure - 4-6Å; 2Å; 0.5Å

NidRMS

2

Page 28: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Model quality

The main factors are:

The sequence identity with the primary parent The number and size of indels The quality of the alignment The amount of change which has been necessary to the

parent(s) to create the model.

Page 29: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Summary of CASP2results

CASP8 ransummer 2008

http://predictioncenter.gc.ucdavis.edu

Page 30: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Medical Applications Lecture

Page 31: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Mutations, Alleles & Polymorphisms

• Mutation:– any change in DNA sequence

• Allele:– alternative form of a genetic locus; one inherited from

each parent– e.g. eye colour locus - brown and blue alleles

• Polymorphism:– genetic variation present in >= 1% of a normal

population

Page 32: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

How are SNPs useful?

• Understanding evolution– Some alleles may be advantageous in one

environment, but disadvantageous in another

• DNA fingerprinting• Markers to map traits

– diseases, characteristics

• Pharmacogenomics– genotype-specific medications

Page 33: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Drug responses

Drug efficacy may be affected by:

• transporters• metabolism• receptors• signalling pathways, etc.

Page 34: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Potentially lethal SNPs

First described ~2000 years ago

“What is food to some men may be fierce poison toothers”

Lucretius Caro

Page 35: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Protein Sequence

DNA Sequence

Protein Structure Protein Function

Mutation

Altered Sequence

Altered Structure

AlteredFunction

UnderstandStructure &

Function

Restore Structure

RestoredFunction

DesignDrugs

Page 36: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

• Looked at p53...

• Local level - effects of mutations

• General classes– Functional– Fold Preventing– Destabilizing

Types of mutations

Page 37: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk
Page 38: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

How human?

Chimeric: 67% human

Humanized: 90% human

Mouse: 0% human

Page 39: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Antibody Humanization

Page 40: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Summary

– Diagnosis of disease– Prediction of disease risk– Prognosis– Customized response to disease– Identifying drug targets - treatments– Engineering of proteins for therapy

Page 41: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Docking and Drug Design Lecture

Page 42: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Van der Waals forcesElectrostatic (Salt bridge) InteractionHydrogen bondsHydrophobic bonding

+ + -+ +

Surface complementarity

-+ + + ++

Page 43: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Six degrees of freedom- protein and ligand both treated as rigid- 3 rotations / 3 translations

Docking methods - rigid body

Just like docking the space shuttle with a satellite

Image from NASA

Page 44: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Treat receptor as static / ligand as flexible

Dock ligand into binding pocket- generate large number of possible orientations

Evaluate and select by energy function

Docking methods - flexible ligand

Page 45: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Ligand Matching

• Match sphere centres against ligand atoms• Find possible ligand orientations• Often >10,000 orientations possible

Find the transformation (rotation + translation) to maximize sphere matching

DOCK

Page 46: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Virtual Screening

• Docking can be used for virtual screening

• Scan a library of potential drug molecules• Identify leads

Page 47: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

LUDI (InsightII) - find fragments that can bind

GRID - uses molecular mechanics potential to find interaction sites for probe groups

X-site - uses an empirical potential to find interaction sites for probe groups

De Novo Drug Design

Page 48: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Stupid mistakes...

• Don't confuse secondary databases with secondary structure!

• Ensure you know the difference between SCOP/PFam functional domains and CATH structural domains

Page 49: BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk

Summary

• Find pockets• Principles for docking - complementarity• Docking

– rigid body / ligand flexibility

• Virtual screening• Identifying probe interaction sites

– build ligands de novo