mike arnoult 9/30/2010 the role of artificial neural networks in phage research
Post on 19-Dec-2015
218 views
TRANSCRIPT
Mike Arnoult
9/30/2010
The role of Artificial Neural Networks in Phage Research
What is an Artificial Neural Network?
Mathematical and computational model
Motivated by biological neurons
Trained by using features to learn patterns and commonalities
Uses values of its neuron connections to classify an example
The neural network can be trained to recognize features of phage proteins, and distinguish between them.
I have trained ANNs to recognize and classify phage major capsid proteins
Why Apply Artificial Neural Networks to Phage Research?
What is a Bacteriophage?
A virus that infects bacteria
The most common biological entity on earth
A major impact on any environment with Bacteria
A type of virus with a highly unique structure, which injects its genome into a host, through its tail
A possible alternative to Antibiotics in medicine
How the ANN works:
Why Apply Artificial Neural Networks to Bioinformatics?The Neural Network can be trained to recognize features of proteins, and distinguish between them.
In my research, I will train Neural Networks to recognize phage major capsid or tail proteins.
What I’ve done so far:I’ve collected Positive and Negative Data sets from NCBI
Positive data sets included Phage Major Capsid Proteinsand synonyms:
Major Shell ProteinMajor Head ProteinMajor Coat ProteinMajor Procapsid ProteinMajor Prohead Protein…
Negative data sets included phage proteins unrelated toMajor capsid proteins
Packaging proteinsSpike proteinsDNA and RNA PolymeraseAssembly proteinsContractile Sheath proteins
What I’ve done so far:
I have written and used Perl scripts to filter the Training Data
Any sequences with conspicuously incorrect GenPept annotations were removed from the positive data-set.
All sequences with Major Capsid Protein related annotations were removed from the negative data-set.
What I’ve done so far:
I’ve turned the sequences into percent compositions of Amino Acids and side-chain groups, to Train Neural Networks
The positive entries are labeled with a 1 and the negative entries are labeled with a –1.
Using a Matlab Script, a random 20% of the positive data-set is set aside and used as a test set against the other 80%.
What I’m doing now:
To find which criteria are best suited to Training the Neural Network to recognize Phage Major Capsid Proteins…
I am training neural networks using different characteristics of Amino Acid side-chains (Polar, Nonpolar, Aromatic, Positive and Negative)
Adjusting parameters of the way the Matlab script trains Neural Networks.
Classification of Known Sequences:
The values are average percentages of correctly classified sequences, of 1000 separately trained Neural Networks .
Amino Acid and Side-chain Percent Compositions used as features
Amino Acid Percent Compositions used as featuresNo Side chains
92.9233%
What I’m going to do Soon:Test The Neural Networks using other Phage Major Capsid Proteins
Ramy’s curated Phage Major Capsid Proteins
Eventually verify the Neural Network predictions in the lab.
THE END