lesson 9: analyzing dna sequences and dna barcoding
DESCRIPTION
LESSON 9: Analyzing DNA Sequences and DNA Barcoding. PowerPoint slides to accompany Using Bioinformatics : Genetic Research. - PowerPoint PPT PresentationTRANSCRIPT
LESSON 9: Analyzing DNA Sequences and DNA Barcoding
PowerPoint slides to accompany
Using Bioinformatics: Genetic Research
Chowning, J., Kovarik, D., Porter, S., Grisworld, J., Spitze, J., Farris, C., K. Petersen, and T. Caraballo. Using Bioinformatics: Genetic Research. Published Online October 2012. figshare. http://dx.doi.org/10.6084/m9.figshare.936568
Image Source: Wikimedia Commons
How DNA Sequence Data is Obtained for Genetic Research
Genetic Data
…TTCACCAACAGGCCCACA…
Extract DNA from Cells
Sequence DNA
CompareDNA
Sequences to One Another
Obtain Samples: Blood , Saliva, Hair
Follicles, Feathers, Scales
TTCAACAACAGGCCCACTTCACCAACAGGCCCACTTCATCAACAGGCCCAC
GOALS:• Identify the organism from which the DNA was obtained.• Compare DNA sequences to each other.
Overview of DNA Sequencing
DNA Sample
Mix with primersPerform sequencing reaction
…T T C A C C A A C T G G C C C A C A…
DNA Sequence Chromatogram
Image Source: Wikimedia Commons
Sequence Both Strands of DNA
Sequence #1:Top Strand
Sequence #1: Top Strand
Sequence #2: Bottom Strand
A T G A C G G A T C A G C
T A C T G C C T A G T C GSequence #2:Bottom Strand
Compare the Two Sequences
5’- A T G A C G G A T C A G C – 3’
3’- T A C T G C C T A G T C G – 5’
Sequence #1:Top (“F”)
Sequence #2:Bottom (“R”)
Bioinformatics tools like BLAST can be used to compare the sequences from the two strands.
Sequence #1: Top Strand
Sequence #2: Bottom Strand
Image Source: Wikimedia Commons
Viewing DNA Sequences with FinchTV
Image Source: FinchTV
DNA Peaks Can Vary in Height and Width
Image Source: FinchTV
Quality Values Represent the Accuracy of Each Base Call
Quality values represent the ability of the DNA sequencing software to identify the base at a given position.
Quality Value (Q) = log10 of the error probability * -10.
Q10 means the base has a one in ten chance (probability) of being misidentified.
Q20 = probability of 1 in 100 of being misidentified.
Q30 = probability of 1 in 1,000 of being misidentified.
Q40 = probability of 1 in 10,000 of being misidentified.
Quality Values Are Used When Comparing Sequences
Quality values represent the ability of the DNA sequencing software to identify the base at a given position.
Image Source: FinchTV
Background “Noise” May Be Present
Image Source: FinchTV
The Beginning and Ends of Sequences Are Likely To Be Poor Quality
Image Source: FinchTV
Examples of Chromatogram Data
Circle #1: Example of a series of the same nucleotide (many T’s in a row). Notice the highest peaks are visible at each position.
Circle #2: Example of an ambiguous base call. Notice the T (Red) at position 57 (highlighted in blue) is just below a green peak (A) at the same position. Look at the poor quality score on bottom left of screen (Q12). An A may be the actual nucleotide at this position.
Circle #3: Example of two A’s together. The peaks look different, but are the highest peaks at these positions.
#1 #2 #3
Image Source: FinchTV
Transcription and Translation Begin at the Start Codon
5’- A T G A C G G A T G A G C – 3’3’- T A C T G C C T A C T C G – 5’
Sequence #1:
Sequence #2:
Reading Frame +1 M T D Q
There Are Six Potential Reading Frames in DNA
5’- A T G A C G G A T G A G C – 3’3’- T A C T G C C T A C T C G – 5’
Sequence #1:
Sequence #2:
Reading Frame +1 M T D Q Reading Frame +2
Reading Frame +3
Reading Frame -2 Reading Frame -1
Reading Frame -3
Frame-Shifts, Amino Acid Changes, and Stop Codons
5’- A T G A C G G A T G A G C – 3’3’- T A C T G C C T A C T C G – 5’
Sequence #1:
Sequence #2:
Reading Frame +1 M T G E
Reading Frame +2
Accidental insertion of an extra “G” when editing
5’- A T G G A C G G A T G A G – 3’
M D G STOP