chapter 9 structure prediction. motivation given a protein, can you predict molecular structure want...

17
Chapter 9 Structure Prediction

Post on 19-Dec-2015

221 views

Category:

Documents


5 download

TRANSCRIPT

Chapter 9Chapter 9

Structure PredictionStructure Prediction

MotivationMotivation

Given a protein, can you predict molecular structure

Want to avoid repeated x-ray crystallography, but want accuracy

You could use nucleotide alignment, but what do you do with the gapped regions?

More complex methods are only justified if they can be shown to perform better than simpler methods

Simpler methods are only justified if they can perform better than basic sequence alignment

Given a protein, can you predict molecular structure

Want to avoid repeated x-ray crystallography, but want accuracy

You could use nucleotide alignment, but what do you do with the gapped regions?

More complex methods are only justified if they can be shown to perform better than simpler methods

Simpler methods are only justified if they can perform better than basic sequence alignment

First StepFirst Step

Some structure comparison methods use secondary structures of the new sequence

Predict location of secondary structure elements along the protein’s backbone and the degree of residue burial

Supervised learning has been shown to perform well in this task

Some structure comparison methods use secondary structures of the new sequence

Predict location of secondary structure elements along the protein’s backbone and the degree of residue burial

Supervised learning has been shown to perform well in this task

Artificial Neural NetworkArtificial Neural Network

PredictsStructure at this point

PredictsStructure at this point

DangerDanger

You may train the network on your training set, but it may not generalize to other data

Perhaps we should train several ANNs and then let them vote on the structure

You may train the network on your training set, but it may not generalize to other data

Perhaps we should train several ANNs and then let them vote on the structure

Profile network from HeiDelbergProfile network from HeiDelberg family (alignment is used as input) instead of just the

new sequence On the first level, a window of length 13 around the

residue is used The window slides down the sequence, making a

prediction for each residue The input includes the frequency of amino acids

occurring in each position in the multiple alignment (In the example, there are 5 sequences in the multiple alignment)

The second level takes these predictions from neural networks that are centered on neighboring proteins

The third level does a jury selection

family (alignment is used as input) instead of just the new sequence

On the first level, a window of length 13 around the residue is used

The window slides down the sequence, making a prediction for each residue

The input includes the frequency of amino acids occurring in each position in the multiple alignment (In the example, there are 5 sequences in the multiple alignment)

The second level takes these predictions from neural networks that are centered on neighboring proteins

The third level does a jury selection

PHDPHD

Predicts 4Predicts 4

Predicts 6Predicts 6

Predicts 5Predicts 5

ThreadingThreading

Threading matches structure to sequence

True threading considers 3D spatial interactions

Threading matches structure to sequence

True threading considers 3D spatial interactions

3D-1D Matching (Bowie et al.)3D-1D Matching (Bowie et al.)

Convert 3D structure into a stringInclude -helix, -sheet or neitherInclude buried or solvent accessible (6

levels) Total of 3X6=18 distinct statesWith Pa:j= probability of finding amino

acid (a) in environment (j) and Pa=probability of finding (a) anywhere

Convert 3D structure into a stringInclude -helix, -sheet or neitherInclude buried or solvent accessible (6

levels) Total of 3X6=18 distinct statesWith Pa:j= probability of finding amino

acid (a) in environment (j) and Pa=probability of finding (a) anywhere

saj = logPa: j

Pa

⎝ ⎜

⎠ ⎟

3D-1D3D-1D

Calculate the information values score on a training set of multiple alignments and the score was used as a profile for each column

When applied to the globin family an clearly identified myoglobins from nonglobins but not from other globins

Calculate the information values score on a training set of multiple alignments and the score was used as a profile for each column

When applied to the globin family an clearly identified myoglobins from nonglobins but not from other globins

Methods using 3D interactionsMethods using 3D interactions

Residues that have large separation in the sequence may end up next to each other when the protein is folded.

Define a measure of contact between residues (two atoms within 5Å) and count frequency of contact between all pairs in PDB

Use measure in alignment to evaluate cost, or to select the best alignment

Residues that have large separation in the sequence may end up next to each other when the protein is folded.

Define a measure of contact between residues (two atoms within 5Å) and count frequency of contact between all pairs in PDB

Use measure in alignment to evaluate cost, or to select the best alignment

3D interactions3D interactions

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Potentials of mean force (POMF)Potentials of mean force (POMF)

Since the notion of contact is somewhat arbitrary, a more general formulation can be tried

Derive an empirical function for the propensity of each of the 400 pairs of residues to be any given distance apart.

Since the notion of contact is somewhat arbitrary, a more general formulation can be tried

Derive an empirical function for the propensity of each of the 400 pairs of residues to be any given distance apart.

Multiple Sequence ThreadingMultiple Sequence Threading

Multiple Sequence Alignment Align the most similar to create a consensus

sequence Align consensus sequences to create overall

alignmentUse the same strategy with structuresAssume that conserved hydrophobic

positions should pack in the coreThis appears to be work in progress (1997)

Multiple Sequence Alignment Align the most similar to create a consensus

sequence Align consensus sequences to create overall

alignmentUse the same strategy with structuresAssume that conserved hydrophobic

positions should pack in the coreThis appears to be work in progress (1997)

ExampleExample

Two small hydrophobic residues alanine (A) and valine (V), both of which favor packing in the core of the protein. The POMF would have a

peak around 5AAspartate (D) and valine

since do not often pack together The POMF will have a dip

around 5A

Two small hydrophobic residues alanine (A) and valine (V), both of which favor packing in the core of the protein. The POMF would have a

peak around 5AAspartate (D) and valine

since do not often pack together The POMF will have a dip

around 5A

POMF(A,V)

POMF(D,V)

Pro

bab

ility

Pro

bab

ility

Distance

Distance

5A

5A

Sequence-Structure AlignmentSequence-Structure Alignment

For all know structures Align the unknown sequence to that

structure Find the best alignment Return the structure with the best global

alignmentUnfortunately, we cant use dynamic

programming (NP Complete) Heuristics must be used to explore the

space.

For all know structures Align the unknown sequence to that

structure Find the best alignment Return the structure with the best global

alignmentUnfortunately, we cant use dynamic

programming (NP Complete) Heuristics must be used to explore the

space.

Evaluating MethodsEvaluating Methods Is the complexity worth it? This is difficult without a benchmark Few comparative studies have been performed

When they have been performed, authors of competing methods have complained that wrong parameters were used …

Critical Assessment of Structure Prediction (CASP 1994) releases protein structures prior to publication. All methods submit their predictions Predictions are analyzed based on fold recognition,

modeling accuracy and alignment accuracy. No one method or approach is obviously superior

Is the complexity worth it? This is difficult without a benchmark Few comparative studies have been performed

When they have been performed, authors of competing methods have complained that wrong parameters were used …

Critical Assessment of Structure Prediction (CASP 1994) releases protein structures prior to publication. All methods submit their predictions Predictions are analyzed based on fold recognition,

modeling accuracy and alignment accuracy. No one method or approach is obviously superior