an algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments
DESCRIPTION
An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments. Jessica Wehner and Madhavi Ganapathiraju Department of Biomedical Informatics University of Pittsburgh School of Medicine Pittsburgh PA USA Presented by Thahir P. Mohamed. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/1.jpg)
An algorithm to guide selection of specific biomolecules to be studied
by wet-lab experimentsJessica Wehner and Madhavi Ganapathiraju
Department of Biomedical InformaticsUniversity of Pittsburgh School of Medicine
Pittsburgh PA USA
Presented byThahir P. Mohamed
Advancing Practice, Instruction & Innovation through InformaticsOctober 19-23, 2008
![Page 2: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/2.jpg)
2
Protein Structure
Primary Structure: Chain of amino acids
Secondary Structure: Sub-structures such as helixes and strands
Tertiary Structure: Atomic resolution of protein structure
Protein structure is essential for successful design of drugs
![Page 3: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/3.jpg)
3
Challenges in Protein Structure Prediction
• X-ray crystallography, NMR spectroscopy are wet-lab methods to determine structure.
• Very expensive
• Very time consuming
• Computational techniques are applied to predict protein structure
![Page 4: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/4.jpg)
4
Computational Protein Structure Prediction
• Machine Learning techniques applied to predict structure
• Experimentally determined structures are used to learn to predict new structures
• When not enough data to learn from:
• Active learning is applied to select the next protein to be studied experimentally
![Page 5: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/5.jpg)
5
Active Learning
Unlabeled Proteins
Possible Labels:
![Page 6: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/6.jpg)
6
Cluster Unlabeled Proteins
Clustered Protiens
Possible Labels:
Active Learning
![Page 7: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/7.jpg)
7
Cluster Unlabeled Proteins
Selection Algorithm
Clustered Proteins
Possible Labels:
Active Learning
![Page 8: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/8.jpg)
8
Cluster Unlabeled Proteins
Selection Algorithm
Clustered Proteins
Possible Labels:
Active Learning
![Page 9: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/9.jpg)
9
Prediction
Labeled Protiens
Cluster Unlabeled Proteins
Selection Algorithm
Possible Labels:
Active learning guides selection of data points for which you ask for labels
Active Learning
![Page 10: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/10.jpg)
10Membrane Protein Structure Prediction
Membrane Protein importance and challenges
Membrane Proteins: 30% of genes cell regulation and signaling pathways 60% of drug targets
Yet, Difficult to study experimentally 1% of known protein structures
Active learning can be used as a tool against the limited number of known MP structures despite the large number of
known MP sequences
![Page 11: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/11.jpg)
11
‘Features’ Representation
Data reduction is performed by SVD, resulting in a final 4 features per window.
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
Residue: A L H W R A A G A A T V L L V I V E R G A P G A Q L I
Topology: - - - - - M M M M M M M M M M M M - - - - - - - - - -
Charge: - - p – p - - - - - - - - - - - - n p - - - - - - - -
E-Prop: D d . . A D D . D D a d d d d d d D A . D D . D a d d
Properties
ChargeSizePolarityAromaticityElectronic Properties
![Page 12: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/12.jpg)
12Clustering the Data
Dim 1Dim 2
Dim
3
Neural Network Self Organizing Map (SOM)
• Finds centroids of clusters in the data
![Page 13: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/13.jpg)
13
Design 1:Density-based Selection
• Find the most dense cluster– Choose N points closest to its centroid
– Find labels for these points (TM or NTM)
– Find the majority label, say L
– Assign L to all points in the cluster
• Repeat for next dense cluster
Clusters with no known structures are marked for study by experiments
![Page 14: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/14.jpg)
14
Design 1 Results• Increase the number of data points for which we ask
structure • Compare how accuracy varies between guided selection
(via active learning) versus random selection.
0102030405060708090
1 4 7 10 13 16 19 22 25 28 31 34 37 40
Pe
rce
nt
Number of labels per node
Density based PRECISION Density based FSCORE
Random based PRECISION Random based FSCORE
A total of only 10 labels per node ~ 1% data
![Page 15: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/15.jpg)
15
Design 2:Protein – based Selection
• Pick a random protein
• Find labels for all windows in this protein
• For each node containing labels, find the mode L of all labels it contains
• Assign L to remaining data in node
• Repeat and update for new protein, until half have been selected
![Page 16: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/16.jpg)
16
Protein-based results
Repeated for different permutations of protein selection order, and observed several metrics.
Pe
rce
nt
![Page 17: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/17.jpg)
Conclusions17
• We developed a framework that allows us to select a few proteins or fragments of proteins which, when annotated with experimental methods, may be used to label remaining protein sequences.
• We have shown that it is possible to achieve higher accuracy values with guided selection of data compared to random selection of data.
![Page 18: An algorithm to guide selection of specific biomolecules to be studied by wet-lab experiments](https://reader034.vdocuments.site/reader034/viewer/2022051417/56814bc1550346895db89662/html5/thumbnails/18.jpg)
Acknowledgements
Madhavi GanapathirajuJessica Wehner
JW funded through NIH-NSF Bioengineering & Bioinformatics Summer
Institute
Visit us at
Department of Biomedical Informatics University of Pittsburgh
Thank you!
Cathedral of Learning, University of Pittsburgh
www.dbmi.pitt.edu/madhavi