summer bioinformatics workshop 2008 sequence alignments chi-cheng lin, ph.d. associate professor...
Post on 20-Dec-2015
215 Views
Preview:
TRANSCRIPT
Summer Bioinformatics Workshop 2008
Sequence Alignments
Chi-Cheng Lin, Ph.D.Associate Professor
Department of Computer ScienceWinona State University – Rochester Center
clin@winona.edu
2
Summer Bioinformatics Workshop 2008
Sequence Alignments Cornerstone of bioinformatics What is a sequence?
Nucleotide sequence Amino acid sequence
Pairwise and multiple sequence alignments What alignments can help
Determine function of a newly discovered gene sequence
Determine evolutionary relationships among genes, proteins, and species
Predict structure and function of protein
3
Summer Bioinformatics Workshop 2008
Why Align Sequences? The draft human genome is available Automated gene finding is possible Gene: AGTACGTATCGTATAGCGTAA
What does it do?What does it do?
One approach: Is there a similar gene in another species? Align sequences with known genes Find the gene with the “best” match
4
Summer Bioinformatics Workshop 2008
Visualization of Sequence Alignment Dot Plot One of the simplest and oldest methods for
sequence alignment Visualization of regions of similarity
Assign one sequence on the horizontal axis Assign the other on the vertical axis Place dots on the space of matches Diagonal lines means adjacent regions of
identity
5
Summer Bioinformatics Workshop 2008
A Simple Example Construct a simple
dot plot for
TAGTCGATGTGGTCATC
The alignment is
TAGTCGATGTGGTC-ATC
T A G T C G A T G
T * * *
G * * *
G * * *
T * * *
C *
A * *
T * * *
C *
6
Summer Bioinformatics Workshop 2008
Genes Accumulate Mutations over Time
Mistakes in gene replication or repair Deletions, duplications Insertions, inversions Translocations Point mutations
Environmental factors Radiation Oxidation
7
Summer Bioinformatics Workshop 2008
Codon deletion:ACG ATA GCG TAT GTA TAG CCG… Effect depends on the protein, position, etc. Almost always deleterious Sometimes lethal
Frame shift mutation: ACG ATA GCG TAT GTA TAG CCG… ACG ATA GCG ATG TAT AGC CG?… Almost always lethal
Deletions
8
Summer Bioinformatics Workshop 2008
Indels Comparing two genes it is generally
impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known:
ACGTCTGATACGCCGTATCGTCTATCTACGTCTGAT---CCGTATCGTCTATCT
9
Summer Bioinformatics Workshop 2008
The Genetic Code
SubstitutionsSubstitutions are mutations accepted by natural selection.
Synonymous: CGC CGA
Non-synonymous: GAU GAA
10
Summer Bioinformatics Workshop 2008
Point Mutation Example: Sickle-cell Disease
Wild-type hemoglobin
DNA
3’----CTT----5’
mRNA
5’----GAA----3’
Normal hemoglobin
------[Glu]------
Mutant hemoglobin
DNA
3’----CAT----5’
mRNA
5’----GUA----3’
Mutant hemoglobin
------[Val]------
11
Summer Bioinformatics Workshop 2008
image credit: U.S. Department of Energy Human Genome Program, http://www.ornl.gov/hgmis.
12
Summer Bioinformatics Workshop 2008
Comparing Two Sequences Point mutations, easy:ACGTCTGATACGCCGTATAGTCTATCTACGTCTGATTCGCCCTATCGTCTATCT
Indels are difficult, must align sequences:ACGTCTGATACGCCGTATAGTCTATCTCTGATTCGCATCGTCTATCT
ACGTCTGATACGCCGTATAGTCTATCT----CTGATTCGC---ATCGTCTATCT
13
Summer Bioinformatics Workshop 2008
Scoring a Sequence Alignment Example
Match score: +1 Mismatch score: +0
Gap penalty: –1
ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || ||||||||----CTGATTCGC---ATCGTCTATCT Matches: 18 × (+1) Mismatches: 2 × 0 Gaps: 7 × (– 1)
Various scoring scheme exist.
Score = 18 + 0 + (-7) = +11Score = 18 + 0 + (-7) = +11
14
Summer Bioinformatics Workshop 2008
How can we find an optimal alignment?
Finding the alignment is computationally hard:ACGTCTGATACGCCGTATAGTCTATCTCTGAT---TCG-CATCGTC--T-ATCT
There are ~888,000 possibilities to align the two sequences given above.
Algorithms using a technique called “dynamic programming” are used – out of the scope of this workshop.
15
Summer Bioinformatics Workshop 2008
Global and Local Alignments Global alignments – score the entire alignment Local alignment – find the best matching
subsequence Why local sequence alignment?
Global alignment is useful only if the sequences to be aligned are very similar
Subsequence comparison between a DNA sequence and a genome
Identify Conserved regions Protein function domains
16
Summer Bioinformatics Workshop 2008
Example Compare the two sequences:TTGACACCCTCCCAATT ACCCCAGGCTTTACACAG
Global alignment (does it look good?)TTGACACCCTCC-CAATT || || || ACCCCAGGCTTTACACAG
Local alignment (does it look good?)---------TTGACACCCTCCCAATT || |||| ACCCCAGGCTTTACACAG--------
top related