cap 5510: introduction to bioinformatics cgs 5166 ...giri/teach/bioinf/s15/lec2-biol-seqaln.pdf ·...
TRANSCRIPT
![Page 1: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/1.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
CAP 5510: Introduction to BioinformaticsCGS 5166: Bioinformatics Tools
Giri Narasimhan
ECS 254A / EC 2474; Phone x3748; Email: [email protected]
My Homepage: http://www.cs.fiu.edu/~giri
http://www.cs.fiu.edu/~giri/teach/BioinfS15.html
Office ECS 254 (and EC 2474); Phone: x-3748Office Hours: By Appointment Only
Jan 19, 2015
![Page 2: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/2.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Presentation Outline
1 Molecular Biology Preliminaries
2 Databases
3 Sequence Alignment
![Page 3: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/3.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
The drama of Molecular Biology . . . the actors
http://exploringorigins.org/images/centralDogma.jpg
![Page 4: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/4.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
The Polymeric Players
Molecule Unit Name Unit Composition
DNA Nucleotide A, C, G, Tor Base
RNA Nucleotide A, C, G, Uor Base
Protein Amino acid amino acids representedresidue by 20-letter alphabet
missing {B, J, O, U, X, Z}
![Page 5: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/5.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
The Polymeric Players
Molecule Unit Name Unit Composition
DNA Nucleotide A, C, G, Tor Base
RNA Nucleotide A, C, G, Uor Base
Protein Amino acid amino acids representedresidue by 20-letter alphabet
missing {B, J, O, U, X, Z}
![Page 6: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/6.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
The Polymeric Players
Molecule Unit Name Unit Composition
DNA Nucleotide A, C, G, Tor Base
RNA Nucleotide A, C, G, Uor Base
Protein Amino acid amino acids representedresidue by 20-letter alphabet
missing {B, J, O, U, X, Z}
![Page 7: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/7.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Typical DNA Sequence
1 gggagaacac ccggagaagg aggaggaggc gaagaaaagc aacagaagcc cagttgctgc
61 tccaggtccc tcggacagag ctttttccat gtggagactc tctcaatgga cgtgccccct
121 agtgcttctt agacggactg cggtctccta aaggtcgacc atggtggccg ggacccgctg
181 tcttctagtg ttgctgcttc cccaggtcct cctgggcggc gcggccggcc tcattccaga
241 gctgggccgc aagaagttcg ccgcggcatc cagccgaccc ttgtcccggc cttcggaaga
301 cgtcctcagc gaatttgagt tgaggctgct cagcatgttt ggcctgaagc agagacccac
361 ccccagcaag gacgtcgtgg tgccccccta tatgctagat ctgtaccgca ggcactcagg
421 ccagccagga gcgcccgccc cagaccaccg gctggagagg gcagccagcc gcgccaacac
481 cgtgcgcagc ttccatcacg aagaagccgt ggaggaactt ccagagatga gtgggaaaac
541 ggcccggcgc ttcttcttca atttaagttc tgtccccagt gacgagtttc tcacatctgc
601 agaactccag atcttccggg aacagataca ggaagctttg ggaaacagta gtttccagca
661 ccgaattaat atttatgaaa ttataaagcc tgcagcagcc aacttgaaat ttcctgtgac
721 cagactattg gacaccaggt tagtgaatca gaacacaagt cagtgggaga gcttcgacgt
781 caccccagct gtgatgcggt ggaccacaca gggacacacc aaccatgggt ttgtggtgga
841 agtggcccat ttagaggaga acccaggtgt ctccaagaga catgtgagga ttagcaggtc
901 tttgcaccaa gatgaacaca gctggtcaca gataaggcca ttgctagtga cttttggaca
961 tgatggaaaa ggacatccgc tccacaaacg agaaaagcgt caagccaaac acaaacagcg
![Page 8: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/8.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Building Blocks of DNA & RNA
Fig 1.1, Zvelebil & Baum
![Page 9: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/9.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
DNA – Double Helix Structure
Fig 1.3, Zvelebil & Baum
![Page 10: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/10.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
DNA Molecule
From http://www.cellsalive.com/cells/cell_model.htm
![Page 11: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/11.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
RNA Molecule
Fig 1.5, Zvelebil & Baum
![Page 12: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/12.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Protein – The 20 Amino Acids
Letter 3 Letter Amino Letter 3 Letter AminoCode Code Acid Code Code Acid
A Ala Alanine M Met MethionineC Cys Cysteine N Asn AsparagineD Asp Aspartic P Pro Proline
AcidE Glu Glutamic Q Gla Glutamine
AcidF Phe Phenylalanine R Arg ArginineG Gly Glycine S Ser SerineH His Histidine T Thr ThreonineI Ile Isoleucine V Val ValineK Lys Lysine W Trp TrypophanL Leu Leucine Y Tyr Tyrosine
![Page 13: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/13.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Protein – Typical Sequence
>gi|23491729|dbj|BAC16799.1| P53 [Homo sapiens]
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAA
PRVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKT
CPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRN
TFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVHVCACPGR
DRRTEEENLRKKGEPHHELPPGSTKRALSNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALEL
KDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD
![Page 14: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/14.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Protein molecules have a 3D structure
From Branden & Tooze
![Page 15: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/15.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Central Dogma
Fig 1.6, Zvelebil & Baum
![Page 16: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/16.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
DNA Replication
Fig 1.4, Zvelebil & Baum
![Page 17: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/17.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
The Cell
Fromhttp://www.cellsalive.com/cells/cell_model.htm
![Page 18: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/18.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Bacterial Chromosomes
Fromhttp://www.cellsalive.com/cells/cell_model.htm
![Page 19: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/19.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Human Chromosomes
Fromhttp://www.cellsalive.com/cells/cell_model.htm
![Page 20: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/20.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Genes
From http://www2.le.ac.uk/departments/genetics/vgec/diagrams/36chromosomeunravel.jpg
![Page 21: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/21.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Human Chromosomes
From http://www.cellsalive.com/cells/cell_model.htm
![Page 22: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/22.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Central Dogma
![Page 23: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/23.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
RNA and Genetic Code
![Page 24: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/24.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Central Dogma
![Page 25: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/25.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Transcription
Fig 1.7, Zvelebil & Baum
![Page 26: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/26.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Transcription
Courtesy: Dr. Kalai Mathee
![Page 27: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/27.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Transcription
Fig 1.6, Zvelebil & Baum
![Page 28: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/28.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Transcription
![Page 29: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/29.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Translation
![Page 30: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/30.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Translation
![Page 31: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/31.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Presentation Outline
1 Molecular Biology Preliminaries
2 Databases
3 Sequence Alignment
![Page 32: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/32.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
3 Major Public Databases
GenBank
National Center for Biotechnology Information (NCBI)
EMBL European Mol Biol Laboratory
European Bioinformatics Institute (EBI)
DDBJ: DNA Data Bank of Japan
National Institute of Genetics (NIG)
All 3 have been completely integrated!
![Page 33: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/33.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Entrez Portal @ NCBI
PubMed; Bookshelf
DNA and Protein Sequence database
Protein Structure database
Genome Assemblies
BLAST
dbSNP
Taxonomy Browser
Population study data sets
PubChem
GEO (Gene Expression Omnibus)
OMIM (Mendelian Inheritance in Man)
![Page 34: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/34.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Other Important Databases
PDB http://www.wwpdb.org/
KEGG http://www.genome.jp/kegg/
MetaCyc http://metacyc.org
ENCODE http://encodeproject.org/ENCODE/
(functional elements in human genome)
1000 Genomes Project
International HapMap Project
Human Microbiome Project
Human Epigenome Project
Gene Ontology (GO)
Human Connectome Project
![Page 35: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/35.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Presentation Outline
1 Molecular Biology Preliminaries
2 Databases
3 Sequence Alignment
![Page 36: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/36.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
1. Can show sequences are close
![Page 37: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/37.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
2. Can show sequences have similar parts
![Page 38: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/38.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
3. Can identify similar sequences from DB
![Page 39: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/39.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
4. Can pinpoint mutations
![Page 40: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/40.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
5. Can help in sequence assembly
![Page 41: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/41.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
6. Can be basis for discovery
Early 1970s: SSV causes cancer in some species of monkeys.
1970s: infection by certain viruses cause some cells in culture(in vitro) to grow without bounds.
Hypothesis: Oncogenes in viruses encode cellular growthfactors (proteins to stimulate growth); Uncontrolledquantities of growth factors produced by infected cellscause cancer-like behavior.
1983:
Oncogene from SSV called v-sis isolated & sequenced.Partial sequence for platelet-derived growth factor (PDGF)sequenced & published; PDGF stimulates proliferation ofcells.R.F. Doolittle was maintaining one of the earliesthome-grown databases of published aa sequences.Sequence Alignment of v-sis and PDGF had surprises.
![Page 42: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/42.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
6. Can be basis for discovery
Early 1970s: SSV causes cancer in some species of monkeys.
1970s: infection by certain viruses cause some cells in culture(in vitro) to grow without bounds.
Hypothesis: Oncogenes in viruses encode cellular growthfactors (proteins to stimulate growth); Uncontrolledquantities of growth factors produced by infected cellscause cancer-like behavior.
1983:
Oncogene from SSV called v-sis isolated & sequenced.Partial sequence for platelet-derived growth factor (PDGF)sequenced & published; PDGF stimulates proliferation ofcells.R.F. Doolittle was maintaining one of the earliesthome-grown databases of published aa sequences.Sequence Alignment of v-sis and PDGF had surprises.
![Page 43: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/43.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
6. Can be basis for discovery
Early 1970s: SSV causes cancer in some species of monkeys.
1970s: infection by certain viruses cause some cells in culture(in vitro) to grow without bounds.
Hypothesis: Oncogenes in viruses encode cellular growthfactors (proteins to stimulate growth); Uncontrolledquantities of growth factors produced by infected cellscause cancer-like behavior.
1983:
Oncogene from SSV called v-sis isolated & sequenced.Partial sequence for platelet-derived growth factor (PDGF)sequenced & published; PDGF stimulates proliferation ofcells.R.F. Doolittle was maintaining one of the earliesthome-grown databases of published aa sequences.Sequence Alignment of v-sis and PDGF had surprises.
![Page 44: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/44.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
6. Can be basis for discovery
Early 1970s: SSV causes cancer in some species of monkeys.
1970s: infection by certain viruses cause some cells in culture(in vitro) to grow without bounds.
Hypothesis: Oncogenes in viruses encode cellular growthfactors (proteins to stimulate growth); Uncontrolledquantities of growth factors produced by infected cellscause cancer-like behavior.
1983:
Oncogene from SSV called v-sis isolated & sequenced.Partial sequence for platelet-derived growth factor (PDGF)sequenced & published; PDGF stimulates proliferation ofcells.R.F. Doolittle was maintaining one of the earliesthome-grown databases of published aa sequences.Sequence Alignment of v-sis and PDGF had surprises.
![Page 45: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/45.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
6. Can be basis for discovery
Early 1970s: SSV causes cancer in some species of monkeys.
1970s: infection by certain viruses cause some cells in culture(in vitro) to grow without bounds.
Hypothesis: Oncogenes in viruses encode cellular growthfactors (proteins to stimulate growth); Uncontrolledquantities of growth factors produced by infected cellscause cancer-like behavior.
1983:
Oncogene from SSV called v-sis isolated & sequenced.Partial sequence for platelet-derived growth factor (PDGF)sequenced & published; PDGF stimulates proliferation ofcells.
R.F. Doolittle was maintaining one of the earliesthome-grown databases of published aa sequences.Sequence Alignment of v-sis and PDGF had surprises.
![Page 46: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/46.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
6. Can be basis for discovery
Early 1970s: SSV causes cancer in some species of monkeys.
1970s: infection by certain viruses cause some cells in culture(in vitro) to grow without bounds.
Hypothesis: Oncogenes in viruses encode cellular growthfactors (proteins to stimulate growth); Uncontrolledquantities of growth factors produced by infected cellscause cancer-like behavior.
1983:
Oncogene from SSV called v-sis isolated & sequenced.Partial sequence for platelet-derived growth factor (PDGF)sequenced & published; PDGF stimulates proliferation ofcells.R.F. Doolittle was maintaining one of the earliesthome-grown databases of published aa sequences.
Sequence Alignment of v-sis and PDGF had surprises.
![Page 47: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/47.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
6. Can be basis for discovery
Early 1970s: SSV causes cancer in some species of monkeys.
1970s: infection by certain viruses cause some cells in culture(in vitro) to grow without bounds.
Hypothesis: Oncogenes in viruses encode cellular growthfactors (proteins to stimulate growth); Uncontrolledquantities of growth factors produced by infected cellscause cancer-like behavior.
1983:
Oncogene from SSV called v-sis isolated & sequenced.Partial sequence for platelet-derived growth factor (PDGF)sequenced & published; PDGF stimulates proliferation ofcells.R.F. Doolittle was maintaining one of the earliesthome-grown databases of published aa sequences.Sequence Alignment of v-sis and PDGF had surprises.
![Page 48: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/48.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
PDGF and v-SIS
Alignment was good.
Two regions of alignment
region of 31 aa with 26 matchesregion of 39 with 35 matches
Conclusion:
Previously harmless virus incorporates growth-related gene(proto-oncogene) of its host into its genome.Gene gets mutated in the virus, or moves closer to a strongenhancer, or moves away from a repressor.When virus infects a cell, it causes uncontrolled amount ofgrowth factor.
Several other oncogenes known to be similar togrowth-regulating proteins in normal cells.
![Page 49: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/49.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
7. Can help describe motifs, domains, and familiesof sequences
![Page 50: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/50.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Implications of Sequence Alignment
Mutation in DNA is a natural evolutionary process. Thussequence similarity may indicate common ancestry.
In biomolecular sequences (DNA, RNA, protein), highsequence similarity implies significant structural and/orfunctional similarity.
![Page 51: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/51.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Similarity vs. Homology
Homologous sequences share common ancestry.
Similar sequences are near to each other by someappropriately defined measurable criteria.
![Page 52: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/52.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Types of Sequence Alignment ... 1
![Page 53: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/53.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Types of Sequence Alignment ... 2
![Page 54: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/54.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Types of Sequence Alignment ... 3
![Page 55: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/55.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Types of Sequence Alignment ... 4
![Page 56: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/56.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Sequence Alignment Algorithms
Global alignments: Needleman-Wunsch-Sellers 1970
Local alignments: Smith-Waterman 1981
Both use Dynamic Programming
![Page 57: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/57.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
How to Score Mismatches
![Page 58: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/58.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Revolution in Sequence Alignment Algorithms
FASTA: Lipman, Pearson ’85, ’88
Basic Local Alignment Search Tool (BLAST): Altschul,Gish, Miller, Myers, Lipman ’90
Both programs:
search entire databasestremendous speed and sensitivityreport statistical significance
![Page 59: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/59.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Revolution in Sequence Alignment Algorithms
FASTA: Lipman, Pearson ’85, ’88
Basic Local Alignment Search Tool (BLAST): Altschul,Gish, Miller, Myers, Lipman ’90
Both programs:
search entire databasestremendous speed and sensitivityreport statistical significance
![Page 60: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/60.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST
![Page 61: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/61.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Strategy & Improvements
Lipman et al.: speeded up finding runs of hot spots.
Eugene Myers 94: Sublinear algorithm for approximatekeyword matching.
Karlin, Altschul, Dembo 90, 91: Statistical Significance ofMatches
![Page 62: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/62.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Variants
Nucleotide BLAST
blastnMEGABlASTShort Sequences (higher E-value threshold, smaller wordsize, no low-complexity filtering)
Protein BLAST
blastpPSI-BlASTPHI-BLASTShort Sequences (higher E-value threshold, smaller wordsize, no low-complexity filtering, PAM-30)
Translating BLAST
blastx: Search nucleotide sequence in protein database (6reading frames)Tblastn: Search protein sequence in nucleotide dBTblastx: Search nucleotide seq (6 frames) in nucleotideDB (6 frames)
Pairwise BLAST
![Page 63: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/63.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Parameters
Type of query: nucleotide / protein
Word size, w
Gap penalties, p1, p2
Threshold scores, S ,T
E-value cutoff, E
E-value, E , is the expected number of sequences thatwould have an alignment score greater than the currentscore, S
Number of hits to display, H
Database to search, D
Scoring Matrix, M
![Page 64: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/64.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Parameters
Type of query: nucleotide / protein
Word size, w
Gap penalties, p1, p2
Threshold scores, S ,T
E-value cutoff, E
E-value, E , is the expected number of sequences thatwould have an alignment score greater than the currentscore, S
Number of hits to display, H
Database to search, D
Scoring Matrix, M
![Page 65: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/65.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Parameters
Type of query: nucleotide / protein
Word size, w
Gap penalties, p1, p2
Threshold scores, S ,T
E-value cutoff, E
E-value, E , is the expected number of sequences thatwould have an alignment score greater than the currentscore, S
Number of hits to display, H
Database to search, D
Scoring Matrix, M
![Page 66: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/66.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Parameters
Type of query: nucleotide / protein
Word size, w
Gap penalties, p1, p2
Threshold scores, S ,T
E-value cutoff, E
E-value, E , is the expected number of sequences thatwould have an alignment score greater than the currentscore, S
Number of hits to display, H
Database to search, D
Scoring Matrix, M
![Page 67: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/67.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Parameters
Type of query: nucleotide / protein
Word size, w
Gap penalties, p1, p2
Threshold scores, S ,T
E-value cutoff, E
E-value, E , is the expected number of sequences thatwould have an alignment score greater than the currentscore, S
Number of hits to display, H
Database to search, D
Scoring Matrix, M
![Page 68: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/68.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Parameters
Type of query: nucleotide / protein
Word size, w
Gap penalties, p1, p2
Threshold scores, S ,T
E-value cutoff, E
E-value, E , is the expected number of sequences thatwould have an alignment score greater than the currentscore, S
Number of hits to display, H
Database to search, D
Scoring Matrix, M
![Page 69: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/69.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Parameters
Type of query: nucleotide / protein
Word size, w
Gap penalties, p1, p2
Threshold scores, S ,T
E-value cutoff, E
E-value, E , is the expected number of sequences thatwould have an alignment score greater than the currentscore, S
Number of hits to display, H
Database to search, D
Scoring Matrix, M
![Page 70: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/70.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Parameters
Type of query: nucleotide / protein
Word size, w
Gap penalties, p1, p2
Threshold scores, S ,T
E-value cutoff, E
E-value, E , is the expected number of sequences thatwould have an alignment score greater than the currentscore, S
Number of hits to display, H
Database to search, D
Scoring Matrix, M
![Page 71: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/71.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Parameters
Type of query: nucleotide / protein
Word size, w
Gap penalties, p1, p2
Threshold scores, S ,T
E-value cutoff, E
E-value, E , is the expected number of sequences thatwould have an alignment score greater than the currentscore, S
Number of hits to display, H
Database to search, D
Scoring Matrix, M
![Page 72: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/72.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Database and Scoring Matrix
Databases:
Protein: NR )non-redudant, SwissPROT/UniPROT, pdb,customNucleotide: NR, dbest, dbsts, htgs, gss, pdb, vector, . . ..
Scoring Matrices
PAM Matrices: PAM 40, 160, 250 going fmor shortalignments with high similarity (70-90 %) to members of aprotein family (50-60 %), to longer alignments withdivergent homologous sequences (less than 30 %)BLOSUM Matrices: BLSOUM90, 80, 62, 30 going fmorshort alignments with high similarity (70-90 %) tomembers of a protein family (50-60 %), to weak homologs(30-40 %), to longer alignments with divergenthomologous sequences (less than 30 %) vector, . . ..
![Page 73: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/73.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Database and Scoring Matrix
Databases:
Protein: NR )non-redudant, SwissPROT/UniPROT, pdb,custom
Nucleotide: NR, dbest, dbsts, htgs, gss, pdb, vector, . . ..
Scoring Matrices
PAM Matrices: PAM 40, 160, 250 going fmor shortalignments with high similarity (70-90 %) to members of aprotein family (50-60 %), to longer alignments withdivergent homologous sequences (less than 30 %)BLOSUM Matrices: BLSOUM90, 80, 62, 30 going fmorshort alignments with high similarity (70-90 %) tomembers of a protein family (50-60 %), to weak homologs(30-40 %), to longer alignments with divergenthomologous sequences (less than 30 %) vector, . . ..
![Page 74: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/74.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Database and Scoring Matrix
Databases:
Protein: NR )non-redudant, SwissPROT/UniPROT, pdb,customNucleotide: NR, dbest, dbsts, htgs, gss, pdb, vector, . . ..
Scoring Matrices
PAM Matrices: PAM 40, 160, 250 going fmor shortalignments with high similarity (70-90 %) to members of aprotein family (50-60 %), to longer alignments withdivergent homologous sequences (less than 30 %)BLOSUM Matrices: BLSOUM90, 80, 62, 30 going fmorshort alignments with high similarity (70-90 %) tomembers of a protein family (50-60 %), to weak homologs(30-40 %), to longer alignments with divergenthomologous sequences (less than 30 %) vector, . . ..
![Page 75: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/75.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Database and Scoring Matrix
Databases:
Protein: NR )non-redudant, SwissPROT/UniPROT, pdb,customNucleotide: NR, dbest, dbsts, htgs, gss, pdb, vector, . . ..
Scoring Matrices
PAM Matrices: PAM 40, 160, 250 going fmor shortalignments with high similarity (70-90 %) to members of aprotein family (50-60 %), to longer alignments withdivergent homologous sequences (less than 30 %)BLOSUM Matrices: BLSOUM90, 80, 62, 30 going fmorshort alignments with high similarity (70-90 %) tomembers of a protein family (50-60 %), to weak homologs(30-40 %), to longer alignments with divergenthomologous sequences (less than 30 %) vector, . . ..
![Page 76: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/76.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Database and Scoring Matrix
Databases:
Protein: NR )non-redudant, SwissPROT/UniPROT, pdb,customNucleotide: NR, dbest, dbsts, htgs, gss, pdb, vector, . . ..
Scoring Matrices
PAM Matrices: PAM 40, 160, 250
going fmor shortalignments with high similarity (70-90 %) to members of aprotein family (50-60 %), to longer alignments withdivergent homologous sequences (less than 30 %)BLOSUM Matrices: BLSOUM90, 80, 62, 30 going fmorshort alignments with high similarity (70-90 %) tomembers of a protein family (50-60 %), to weak homologs(30-40 %), to longer alignments with divergenthomologous sequences (less than 30 %) vector, . . ..
![Page 77: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/77.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Database and Scoring Matrix
Databases:
Protein: NR )non-redudant, SwissPROT/UniPROT, pdb,customNucleotide: NR, dbest, dbsts, htgs, gss, pdb, vector, . . ..
Scoring Matrices
PAM Matrices: PAM 40, 160, 250 going fmor shortalignments with high similarity (70-90 %) to members of aprotein family (50-60 %), to longer alignments withdivergent homologous sequences (less than 30 %)BLOSUM Matrices: BLSOUM90, 80, 62, 30
going fmorshort alignments with high similarity (70-90 %) tomembers of a protein family (50-60 %), to weak homologs(30-40 %), to longer alignments with divergenthomologous sequences (less than 30 %) vector, . . ..
![Page 78: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/78.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Database and Scoring Matrix
Databases:
Protein: NR )non-redudant, SwissPROT/UniPROT, pdb,customNucleotide: NR, dbest, dbsts, htgs, gss, pdb, vector, . . ..
Scoring Matrices
PAM Matrices: PAM 40, 160, 250 going fmor shortalignments with high similarity (70-90 %) to members of aprotein family (50-60 %), to longer alignments withdivergent homologous sequences (less than 30 %)BLOSUM Matrices: BLSOUM90, 80, 62, 30 going fmorshort alignments with high similarity (70-90 %) tomembers of a protein family (50-60 %), to weak homologs(30-40 %), to longer alignments with divergenthomologous sequences (less than 30 %) vector, . . ..
![Page 79: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/79.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
BLAST Database and Scoring Matrix
Databases:
Protein: NR )non-redudant, SwissPROT/UniPROT, pdb,customNucleotide: NR, dbest, dbsts, htgs, gss, pdb, vector, . . ..
Scoring Matrices
PAM Matrices: PAM 40, 160, 250 going fmor shortalignments with high similarity (70-90 %) to members of aprotein family (50-60 %), to longer alignments withdivergent homologous sequences (less than 30 %)BLOSUM Matrices: BLSOUM90, 80, 62, 30 going fmorshort alignments with high similarity (70-90 %) tomembers of a protein family (50-60 %), to weak homologs(30-40 %), to longer alignments with divergenthomologous sequences (less than 30 %) vector, . . ..
![Page 80: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/80.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Rules of Thumb
Homology is often characterized by significant similarity overentire sequence or strong similarity in key places.
Matches that are > 50% identical in a 20-40 aa region occurfrequently by chance
Distantly related homologs may lack significant similarity.Homologous sequences may have few absolutely conservedresidues.
Homology is transitive. A homologous to B & B to C ⇒ Ahomologous to C .
Low complexity regions, transmembrane regions and coiled-coilregions frequently display significant similarity withouthomology.
Greater evolutionary distance implies that length of a localalignment required to achieve a statistically significant scorealso increases.
![Page 81: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/81.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Rules of Thumb
Homology is often characterized by significant similarity overentire sequence or strong similarity in key places.
Matches that are > 50% identical in a 20-40 aa region occurfrequently by chance
Distantly related homologs may lack significant similarity.Homologous sequences may have few absolutely conservedresidues.
Homology is transitive. A homologous to B & B to C ⇒ Ahomologous to C .
Low complexity regions, transmembrane regions and coiled-coilregions frequently display significant similarity withouthomology.
Greater evolutionary distance implies that length of a localalignment required to achieve a statistically significant scorealso increases.
![Page 82: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/82.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Rules of Thumb
Homology is often characterized by significant similarity overentire sequence or strong similarity in key places.
Matches that are > 50% identical in a 20-40 aa region occurfrequently by chance
Distantly related homologs may lack significant similarity.Homologous sequences may have few absolutely conservedresidues.
Homology is transitive. A homologous to B & B to C ⇒ Ahomologous to C .
Low complexity regions, transmembrane regions and coiled-coilregions frequently display significant similarity withouthomology.
Greater evolutionary distance implies that length of a localalignment required to achieve a statistically significant scorealso increases.
![Page 83: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/83.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Rules of Thumb
Homology is often characterized by significant similarity overentire sequence or strong similarity in key places.
Matches that are > 50% identical in a 20-40 aa region occurfrequently by chance
Distantly related homologs may lack significant similarity.Homologous sequences may have few absolutely conservedresidues.
Homology is transitive.
A homologous to B & B to C ⇒ Ahomologous to C .
Low complexity regions, transmembrane regions and coiled-coilregions frequently display significant similarity withouthomology.
Greater evolutionary distance implies that length of a localalignment required to achieve a statistically significant scorealso increases.
![Page 84: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/84.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Rules of Thumb
Homology is often characterized by significant similarity overentire sequence or strong similarity in key places.
Matches that are > 50% identical in a 20-40 aa region occurfrequently by chance
Distantly related homologs may lack significant similarity.Homologous sequences may have few absolutely conservedresidues.
Homology is transitive. A homologous to B & B to C ⇒ Ahomologous to C .
Low complexity regions, transmembrane regions and coiled-coilregions frequently display significant similarity withouthomology.
Greater evolutionary distance implies that length of a localalignment required to achieve a statistically significant scorealso increases.
![Page 85: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/85.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Rules of Thumb
Homology is often characterized by significant similarity overentire sequence or strong similarity in key places.
Matches that are > 50% identical in a 20-40 aa region occurfrequently by chance
Distantly related homologs may lack significant similarity.Homologous sequences may have few absolutely conservedresidues.
Homology is transitive. A homologous to B & B to C ⇒ Ahomologous to C .
Low complexity regions, transmembrane regions and coiled-coilregions frequently display significant similarity withouthomology.
Greater evolutionary distance implies that length of a localalignment required to achieve a statistically significant scorealso increases.
![Page 86: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/86.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Rules of Thumb
Homology is often characterized by significant similarity overentire sequence or strong similarity in key places.
Matches that are > 50% identical in a 20-40 aa region occurfrequently by chance
Distantly related homologs may lack significant similarity.Homologous sequences may have few absolutely conservedresidues.
Homology is transitive. A homologous to B & B to C ⇒ Ahomologous to C .
Low complexity regions, transmembrane regions and coiled-coilregions frequently display significant similarity withouthomology.
Greater evolutionary distance implies that length of a localalignment required to achieve a statistically significant scorealso increases.
![Page 87: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/87.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Rules of Thumb ... 2
Results of searches using different scoring systems may becompared directly using normalized scores.
If S is the (raw) score for a local alignment, the normalizedscore S’ (in bits) is given by
S ′ =λ− lnK
ln 2
The parameters depend on the scoring system.
Statistically significant normalized score,
S ′ > log (N/E )
where E-value = E and N = size of search space.
![Page 88: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/88.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Rules of Thumb ... 2
Results of searches using different scoring systems may becompared directly using normalized scores.
If S is the (raw) score for a local alignment, the normalizedscore S’ (in bits) is given by
S ′ =λ− lnK
ln 2
The parameters depend on the scoring system.
Statistically significant normalized score,
S ′ > log (N/E )
where E-value = E and N = size of search space.
![Page 89: CAP 5510: Introduction to Bioinformatics CGS 5166 ...giri/teach/Bioinf/S15/Lec2-Biol-seqaln.pdf · CAP 5510; CGS 5166 Giri Narasimhan Molecular Biology Preliminaries Databases Sequence](https://reader036.vdocuments.site/reader036/viewer/2022062402/5f02ea937e708231d406a57b/html5/thumbnails/89.jpg)
CAP 5510;CGS 5166
GiriNarasimhan
MolecularBiologyPreliminaries
Databases
SequenceAlignment
Rules of Thumb ... 2
Results of searches using different scoring systems may becompared directly using normalized scores.
If S is the (raw) score for a local alignment, the normalizedscore S’ (in bits) is given by
S ′ =λ− lnK
ln 2
The parameters depend on the scoring system.
Statistically significant normalized score,
S ′ > log (N/E )
where E-value = E and N = size of search space.