prediction of protein binding sites in protein structures using hidden markov support vector machine

Post on 19-Jan-2016

222 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine

Slate: the target protein.Blue: the binding partner.Magenta: interface residues.

SSSEIKIVRDEYGMPHIYANDTWHLFYGYG

IIINIINNIINNNIIIIIIINIINIIINNN

Input

Output

Machine Learning Methods Applied

Classification methods

Sequential labelling methods

ANN

SVM

CRF

FEATURES

• Neighboring residue profile feature• Hydrophobicity• Sequence conservation• Secondary structure• Solvent accessible surface area

Hidden Markov Support Vector Machine

Discriminant function

Emission feature function

Transition feature function

Corresponding weight

Hidden Markov Support Vector Machine

Spatially neighboring residue profile feature

𝑒𝑦 ,𝑎𝑎𝑝𝑟𝑜𝑓𝑖𝑙𝑒 (𝑥𝑘 , 𝑦 𝑖 )={L (𝑃𝑆𝑆𝑀 (𝑥𝑘 ,𝑎𝑎 )) ,∧if 𝑦 𝑖=𝑦

0 ,∧otherwise

Spatially neighboring residue accessible surface (ASA) feature

𝑒𝑦𝐴𝑆𝐴 (𝑥𝑘 , 𝑦 𝑖 )={ASA (𝑥𝑘) ,∧if 𝑦 𝑖=𝑦

0 ,∧otherwise

Emission feature function

𝐿 (𝑥 ){ 0 𝑖𝑓 𝑥≤−512+ 𝑥10

𝑖𝑓 −5<𝑥<5

1 h𝑜𝑡 𝑒𝑟𝑤𝑖𝑠𝑒

Hidden Markov Support Vector Machine

Discriminant function

Transition feature function

Hidden Markov Support Vector Machine

Transition feature function

𝑡𝑦 ,𝑦 ′ (𝑥 , 𝑦 𝑖− 1 , 𝑦 𝑖 )={1 ,∧if 𝑦 𝑖−1= y∧𝑦 𝑖=𝑦0 ,∧otherwise

Hidden Markov Support Vector Machine

Discriminant functionCorresponding weight

Hidden Markov Support Vector Machine

Optimization problem

s.t.

Source Code: http://www.cs.cornell.edu/People/tj/svm_light/svm_hmm.html

☆ The cutting-plane algorithm makes it linear

DATA SET

𝐅𝟏=𝟐×𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚+¿×

𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚+¿

𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚 +¿+𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚 +¿ ¿¿¿¿

𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲=𝑻𝑷 +𝑻𝑵

𝐓𝐏+𝑻𝑵+𝑭𝑷+𝑭𝑵

𝐌𝐂𝐂=𝑻𝑷×𝑻𝑵 −𝑭𝑷×𝑭𝑵

√ (𝐓𝐏+𝐅𝐍 ) (𝑻𝑷+𝑭𝑷 ) (𝑻𝑵+𝑭𝑷 ) (𝑻𝑵+𝑭𝑵 )

Influence of the number of training samples on the prediction performance and running time

Influence of the number of training samples on the prediction performance and running time

The inter-relation information between neighboring residues is relevant for discrimination

The window size has not significant influence on the performance

Actual interface residues ANN

SVM CRF HM-SVM

Comparison with related methods

Actual interface residues ANN

SVM CRF HM-SVM

Comparison with related methods

SUMMARY

• Prediction of protein binding sites• Hidden Markov Support Vector Machine• Result Analysis

• Comparison with other methods• Influence of the number of training samples• The information between neighboring residues• Window size

• Discussion

top related