bioinformatics tutorial i blast and sequence alignment
TRANSCRIPT
![Page 1: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/1.jpg)
Bioinformatics Tutorial I BLAST and Sequence Alignment
![Page 2: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/2.jpg)
What is BLAST?
• Online tool from National Center for the Biotechnology Information (NCBI)
• “Google” for proteins and nucleotide sequences
![Page 3: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/3.jpg)
What can you use BLAST for?
• Identify an unknown sequence• Characterize the gene/protein of interest– Function/activity (gene and protein)– Structure or shape (new protein)– Location or preferred location (protein)– Stability (gene/transcript or protein)
• Origin of a gene or protein
![Page 4: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/4.jpg)
Sequence alignment approaches
1. Global alignment– Needleman and Wunsch, 1970
2. Local alignment (used in BLAST)– Smith and Waterman, 1980
![Page 5: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/5.jpg)
Global alignment
• One approach for searching a query sequence is to align the entire sequence against all sequences in a database
• This approach is very slow and hence impractical
![Page 6: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/6.jpg)
BLAST
• A much faster approach• Divides your search query into short sequences
(“words”) and initially looks for exact matches. Once found, these words are then extended
• i.e. Basic Local Alignment Search Tool
• Altschul, S.F. et al. Basic local alignment search tool. J Mol Biol. 215(3):403-10(1990).
![Page 7: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/7.jpg)
BLAST algorithm
• Query sequences are usually split into words• Each word is then searched in database• Word hits are extended in either direction to
generate alignment with score greater than the threshold score
![Page 8: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/8.jpg)
BLAST
“The central idea of the BLAST algorithm is to confine attention to segment pairs that contain a word pair of length w with a score of at least T”
- Alschul et al, 1990
![Page 9: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/9.jpg)
How does BLAST work?
![Page 10: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/10.jpg)
Step 1: Get your sequence
• NCBI, UCSC etc..• Sequencing facility (unknown gene)
![Page 11: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/11.jpg)
Step 2: Choose BLAST program
![Page 12: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/12.jpg)
The different BLAST programs
• blastn (nucleotide BLAST)
• blastp (protein BLAST)
• blastx (translated BLAST)
• tblastn (translated BLAST)
• tblastx (translated BLAST)
![Page 13: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/13.jpg)
Simplified visualization
![Page 14: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/14.jpg)
Why translate in 6 reading frames?
5’ CAT CAA 5’ ATC AAC 5’ TCA ACT
5’ GTG GGT 5’ TGG GTA 5’ GGG TAG
5’ CATCAACTACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’3’ GTAGTTGATGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’
• DNA sequence can code for six different proteins
![Page 15: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/15.jpg)
Step 3: Search parameters
![Page 16: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/16.jpg)
Step 4: Search results
![Page 17: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/17.jpg)
Important: Tabular output
![Page 18: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/18.jpg)
Score
• Sequence similarity score is calculated based on the pair-wise alignment quality
• Alignment score is the sum of scores for each position
![Page 19: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/19.jpg)
Score
• Nucleotides• +1 score for each
match• -2 score for each
mismatch
• Peptides• Each amino acid
substitution is given a score
![Page 20: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/20.jpg)
Example
AACGTTTCCAGTCCAAATAGCTAGGC===--=== =-===-==-====== AACCGTTC TACAATTACCTAGGC
Hits(+1): 18Misses (-2): 5Gaps (existence -2, extension -1): 1 Length: 3Score = 18 * 1 + 5 * (-2) – 2 – 2 = 6
David Fristrom, Introduction to BLAST
![Page 21: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/21.jpg)
E-value
• E-value – expectation value; the number of different alignments which would yield a similar or better score if searched though the database by chance alone.
• Low E-value – sequences may be homologous• Statistical significance depends on..– Length of the query sequence– Size of the sequence database
![Page 22: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/22.jpg)
Graphical output
![Page 23: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/23.jpg)
Taxonomy Results
![Page 24: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/24.jpg)
Graphical output
![Page 25: Bioinformatics Tutorial I BLAST and Sequence Alignment](https://reader035.vdocuments.site/reader035/viewer/2022062301/56649cc95503460f94991ce8/html5/thumbnails/25.jpg)
References
• Figures and text adapted from the following sources:– David Fristrom, Introduction to BLAST– Jonathan Pevsner, BLAST: Basic local alignment search tool– Joanne Fox, BLAST: Finding function by sequence similarity