automatic assignment of nmr spectral data from protein sequences using neurobayes

1
Automatic assignment of NMR spectral data from protein sequences using NeuroBayes Slavomira Stefkova , Michal Kreps and Rudolf A Roemer Department of Physics , University of Warwick, Coventry, UK [email protected] Abstra ct Fig.5: Model of artificial neural network. Artificial Neural networks are non- linear statistical data modelling or decision making tools. They can be used to model complex relationships between inputs and outputs or to find patterns (sequence recognition) in data. Automatizing the assignment of nuclear magnetic resonance (NMR) spectral data from protein allows for nearly automatic structure determination. Assignment is very time consuming. Heteronuclear single quantum correlation (HSQC) experiments usually the cheapest and quickest. Provide unique scatter plots fingerprints of proteins – that can be correlated with well-known Artificial neural networks have been used in other correlation searches for several decades. Fig.1 : Amino acid. Amino acids are biologically important compounds made from amine (-NH2) and carboxylic acid (- COOH) functional group as well as side chain. Fig.2: Structure of a protein. Protein is an organic compound consisting of one or more chains of amino acids. Structure of protein plays very important role because it usually determines its biological function. Fig.3: HSQC plot. NMR spectroscopy is used to determine the structure and the dynamics of the proteins. One of the possible measurements is chemical shift, precise resonant frequency of atom. The easiest and cheapest experiment is heteronuclear quantum correlation experiment (HSQC experiment) which measures hydrogen and nitrogen chemical shifts . Plots that follow are unique fingerprints of each protein. Fig.4 : Biological Magnetic Resonance Bank. Biological Magnetic Resonance Bank is a database that contains hydrogen and nitrogen shifts for several thousands proteins. Fig.6: Workflow of NeuroBayes. NeuroBayes is a neural network used in this project. This neural network consists of two components – NeuroBayes Teacher and NeuroBayes Expert - that are necessary for assignment of unknown protein. Introduct ion Result s Conclusion Protein BMRB id Analysis 0 Analysis 1 Analysis 3 Analysis 4 5000 16.97% 9.82% 23.21% 17.86% 5003 19.05% 14.29% 26.66% 14.29% 5005 15.89% 16.35% 22.43% 15.89% In order to automatize the process of spectral NMR data NeuroBayes Teacher has been trained with a sample of 5717 proteins collected from Biological Magnetic Resonance Bank database. NeuroBayes Teacher provided probability output that had been interpreted using four different approaches. Several proteins have not been included in training so that it is possible to measure success rate of predictive algorithms. However, it was observed that prediction gives similar success rate for known as well as unknown proteins. Table 1: This table shows a sample of success rate of known (5003, 5005) and unknown proteins (5000). Analysis 3 has been so far the most successful interpretation of probability output. It uses following formula: Nowadays , the most successful assignment algorithms reach around 70 % agreement with experimentally assigned proteins. These ; however, have incorporated several combined experiments ‘ data which has not been done it this project. Using this neural network only around 25% amino acids were correctly assigned to the peaks in HSQC experiment. % < % Reference s Fig.1 - http://en.wikipedia.org/wiki/File:AminoAcidball.svg Fig.2 - http://en.wikipedia.org/wiki/File:Main_protein_structure_levels_en.svg Fig.4 - http://deposit.bmrb.wisc.edu/bmrb-adit/docs/tutorial.html Fig.5 - http://en.wikipedia.org/wiki/File:Neural_network_example.svg Fig.6 - https://twiki.cern.ch/twiki/pub/Main/NeuroBayes/np_workflow.png Brain fig. - http://scientopia.org/blogs/scicurious/files/2011/05/neurons51.jpg Acknowledgment s I would like to thank Professor Rudolf Roemer and Dr. Michal Kreps for guidance throughout the execution of this project . Gratitude also goes to the physics department as well as Centre for Scientific Computing in the University of Warwick for providing me with computational resources. Finally, I would like to thank URSS for allowing me to undertake this project. Use NeuroBayes, a neural network implementation from particle physics. Assign HSQC peaks by training NeuroBayes on Biological Magnetic Resonance Bank HSQC plot database. 25% of amino acids within a protein are correctly assigned by NeuroBayes using only chemical shifts in HSQC spectra. Not quite as good as the 70% agreement of other codes, however, training with extra data should improve results.

Upload: merrill

Post on 14-Feb-2016

34 views

Category:

Documents


5 download

DESCRIPTION

Automatic assignment of NMR spectral data from protein sequences using NeuroBayes. Slavomira Stefkova , Michal Kreps and Rudolf A Roemer. Department of Physics , University of Warwick, Coventry, UK [email protected]. Abstract. - PowerPoint PPT Presentation

TRANSCRIPT

PowerPoint Presentation

Automatic assignment of NMR spectral data from protein sequences using NeuroBayesSlavomira Stefkova, Michal Kreps and Rudolf A RoemerDepartment of Physics , University of Warwick, Coventry, [email protected]

Abstract

Fig.5: Model of artificial neural network. Artificial Neural networks are non-linear statistical data modelling or decision making tools. They can be used to model complex relationships between inputs and outputs or to find patterns (sequence recognition) in data.Automatizing the assignment of nuclear magnetic resonance (NMR) spectral data from protein allows for nearly automatic structure determination.Assignment is very time consuming.Heteronuclear single quantum correlation (HSQC) experiments usually the cheapest and quickest. Provide unique scatter plots fingerprints of proteins that can be correlated with well-known protein structures.Artificial neural networks have been used in other correlation searches for several decades.Fig.1 : Amino acid. Amino acids are biologically important compounds madefromamine(-NH2) andcarboxylic acid(-COOH) functional group as well as side chain.Fig.2: Structure of a protein.Protein is an organic compound consisting of one or more chains of amino acids. Structure of protein plays very important role because it usually determines its biological function.

Fig.3: HSQC plot. NMR spectroscopy is used to determine the structure and the dynamics of the proteins. One of the possible measurements is chemical shift, precise resonant frequency of atom. The easiest and cheapest experiment is heteronuclear quantum correlation experiment (HSQC experiment) which measures hydrogen and nitrogen chemical shifts . Plots that follow are unique fingerprints of each protein.

Fig.4 : Biological Magnetic Resonance Bank. Biological Magnetic Resonance Bank is a database that contains hydrogen and nitrogen shifts for several thousands proteins.Fig.6: Workflow of NeuroBayes. NeuroBayes is a neural network used in this project. This neural network consists of two components NeuroBayes Teacher and NeuroBayes Expert - that are necessary for assignment of unknown protein.IntroductionResultsConclusionProtein BMRBidAnalysis 0Analysis 1Analysis 3Analysis 45000

16.97%9.82%23.21%17.86%500319.05%14.29%26.66%14.29%500515.89%16.35%22.43%15.89%In order to automatize the process of spectral NMR data NeuroBayes Teacher has been trained with a sample of 5717 proteins collected from Biological Magnetic Resonance Bank database. NeuroBayes Teacher provided probability output that had been interpreted using four different approaches. Several proteins have not been included in training so that it is possible to measure success rate of predictive algorithms. However, it was observed that prediction gives similar success rate for known as well as unknown proteins. Nowadays , the most successful assignment algorithms reach around 70 % agreement with experimentally assigned proteins. These ; however, have incorporated several combined experiments data which has not been done it this project. Using this neural network only around 25% amino acids were correctly assigned to the peaks in HSQC experiment.

ReferencesFig.1 - http://en.wikipedia.org/wiki/File:AminoAcidball.svgFig.2 - http://en.wikipedia.org/wiki/File:Main_protein_structure_levels_en.svgFig.4 - http://deposit.bmrb.wisc.edu/bmrb-adit/docs/tutorial.htmlFig.5 - http://en.wikipedia.org/wiki/File:Neural_network_example.svgFig.6 - https://twiki.cern.ch/twiki/pub/Main/NeuroBayes/np_workflow.pngBrain fig. - http://scientopia.org/blogs/scicurious/files/2011/05/neurons51.jpg

AcknowledgmentsI would like to thank Professor Rudolf Roemer and Dr. Michal Kreps for guidance throughout the execution of this project . Gratitude also goes to the physics department as well as Centre for Scientific Computing in the University of Warwick for providing me with computational resources. Finally, I would like to thank URSS for allowing me to undertake this project. Use NeuroBayes, a neural network implementation from particle physics.Assign HSQC peaks by training NeuroBayes on Biological Magnetic Resonance Bank HSQC plot database.25% of amino acids within a protein are correctly assigned by NeuroBayes using only chemical shifts in HSQC spectra.Not quite as good as the 70% agreement of other codes, however, training with extra data should improve results.1