boinformatics lecture 5
Post on 18-Jul-2015
609 Views
Preview:
TRANSCRIPT
Bioinformatics Lecture# 5
Dr. Naeem Ud Din Khattak
Professor
Department of Zoology
Islamia College Peshawar (Chartered University)
3
• The mutation distance : The
minimal number of nucleotides that would
need to be altered in order for the gene for one
Protein to code for the other.
• ACTGAT A C T G A T -
T C T - A T C
TCTATC
The construction of the tree
4
• Assume proteins, A, B and C, and their mutation distances.
• There are two Qs:
1. Which pair does one join together first?
2. What are the lengths of edges a, b, and c?
B C
A 24 28
B 32
Which pair does one join together first ?
5
• It is simply by choosing the pair with the
smallest mutation distance.
B C
A 24 28
B 32 A B C
What are the lengths of legs a, b, and c?
6
B C
A 24 28
B 32
a+b=24 a+c=28b+c=32
a =10b =14c =18
A B C
a b
c
a =?b =?c =?
• i. a+b=24 ii. a+c=28 iii. b+c=32
• a+b=24 : a=24-b put the value of a in ii :
• 24-b+c=28 ; c-b=28-24; c-b=4 : c=4+b
• put value of c in iii. b+4+b=32 :
2b+4=32: 2b=32-4;
• b=28/2=14
• Now put the value of b in 1
• Note that this analysis assumes that there are no multiple substitutions|||||||||||||||when a single site undergoes two or more changes e.g. the ancestral sequence … ATGT … gives
… AGGT …
• and … ACGT …).
Based on lectures by C-B Stewart, and by Tal Pupko
Ancestral Node
or ROOT of the Tree
Internal Nodes orDivergence Points (represent hypothetical ancestors of the
taxa)
Branches orLineages
Terminal Nodes
A
B
C
D
E
Represent theTAXA (genes,populations,species, etc.)used to inferthe phylogeny
Phylogenetic Tree Terminology
Based on lectures by C-B Stewart, and by Tal Pupko
Phylogenetic trees diagram the evolutionaryrelationships between the taxa
((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses
Taxon A
Taxon BTaxon C
Taxon E
Taxon D
Based on lectures by C-B Stewart, and by Tal Pupko
((A,(B,C)),(D,E))
Taxon A
Taxon B
Taxon C
Taxon E
Taxon D
__ B and C are more closely related to each otherthan either is to A,___ and A, B, and C form a clade that is a sistergroup to the clade composed of D and E. ____Ifthe tree has a time scale, then D and E are the mostclosely related.
clade
Clade
• Nature acts conservatively, i.e., it does not develop a new kind of biology for every life form but continuously changes and adapts a proven general concept.
• Novel functionalities do not appear because a new gene has suddenly arisen but are developed and modified during evolution.
• Thus, Alleles of a gene found in a population arise from a common ancestor gene_____________ HOMOLOGOUS
Homology is not a measure of similarity, but rather that sequences have a shared evolutionary history and, therefore, possess a common ancestral sequence
(Tatusovet al. 1997).
• An all or none phenomenon
Orthologs• Homologous proteins from different species
that possess the same function (e.g.,
corresponding kinases in a signal
transduction pathway in humans and mice)
are called orthologs.
Paralogs• Homologous proteins that have different
functions in the same species (e.g., two
kinases in different signal transduction
pathways of humans) are termed paralogs.
• A visual representation of orthologs (and some other commonly confused terms, paralogs and homologs)
Orthologs: "genes that have diverged after a speciation event... [that] tend to have similar function" (Fulton et al. 2006). Thus, orthologs are genes whose encoded proteins fulfill similar roles in different species.
Identity
• ratio of the number of identical amino acids or nucleotides relative to the total number of amino acids or nucleotides.
4/20 = 0.2.
similarity• Unlike identity, similarity is not as simple to
calculate. Before similarity can be determined, it must first be defined how similar the building blocks of sequences are to each other.
• This is done with the help of similarity matrices _____ specify the probability at which a sequence transforms into another sequence over time.
• dependent on the time and the mutational rate of nucleotides.
• For protein sequences, an identity matrix is not
sufficient to describe biological and evolutionary processes.
• Amino acids are not exchanged with the same probability as might be conceived theoretically.
• YOU CAN RECALL THE SYNONYMOUS AND NON-SYNONYMOUS MUTATIONS
• For example,
• an exchange of aspartic acid for glutamic acid is frequently observed;
• aspartic acid to tryptophan is seen rarely.
T in
DNA
DNAT
• A second reason for the mutation ofaspartic acid- to- glutamic acid
to occur more often is that both have similar properties.
• In contrast aspartic acid and tryptophan are chemically different – the hydrophobic tryptophan is frequently found in the center of proteins, whereas the hydrophilic aspartic acid occurs more often at the surface.
• Amino acid substitution matrices, therefore, describe the probability at which amino acids are exchanged in the course of evolution.
• The most commonly used amino acid scoring matrices are the
PAM
(Position Accepted Mutation; Dayhoff et al. 1978) and
BLOSUM groups
• (Blocks Substitution Matrix; Henikoff and Henikoff 1992)
Tryptophan Trp WHydrophobic
aspartic acid Asp D
Glutamic acid GluHydrophilic
E
Electrically Charged (negative)
NUCLEOTIDE AND AMINO ACID
SEQUENCES ARE
EVOLUTIONARILY DIFFERENT
SO,
WE NEED DIFFERENT CRITERIA AND
MATRICES TO ANALYZE THEM
Identity
• ratio of the number of identical amino acids or nucleotides relative to the total number of amino acids or nucleotides.
4/20 = 0.2.
Identity
• ratio of the number of identical amino acids or nucleotides relative to the total number of amino acids or nucleotides.
4/20 = 0.2.
Outgroup to root a phylogenetic tree
• The tree of human, chimpanzee, gorilla and orangutan genes is rooted with a baboon gene because
• we know from the fossil record that the common ancestor of the four species split away from baboon earlier in geological time
• Let’s See Members of this Group
Kiwi
Struthio camelus
Swan
song sparrow
Ring nicked Phaesant
Silver pheasant
Parrot
The Design of the phylogenetic TREE does not change the evolutionary distance among the various taxa represented.
Kiwi
Struthio camelus
Swan
song sparrow
Ring nicked Phaesant
Silver pheasant
Parrot
The Design of the phylogenetic TREE does not change the evolutionary distance among the various taxa represented.
Examples of what can be inferred from phylogenetic trees
(DNA, protein)
1. Which species are the closest living relatives of modern humans?
2. Did the infamous Florida Dentist infect his patients with HIV?
3. What is the relation between HIV and SIV
Relatives of modern humans?
Mitochondrial DNA, most nuclear DNA-encoded genes, and DNA/DNA hybridization
The pre-molecular view
MYA
Chimpanzees
Orangutans Humans
Bonobos
GorillasHumans
Bonobos
Gorillas Orangutans
Chimpanzees
MYA015-30014
Based on lectures by C-B Stewart, and by Tal Pupko
2. Did the Florida Dentist infect his patients with HIV?
DENTIST
DENTIST
Patient D
Patient F
Patient C
Patient A
Patient G
Patient B
Patient E
Patient A
Local control 2
Local control 3
Local control 9
Local control 35
Local control 3
Yes:The HIV sequences fromthese patients fall withinthe clade of HIV sequences found in the dentist.
No
No
From Ou et al. (1992) and Page & Holmes (1998)
Phylogenetic treeof HIV sequencesfrom the DENTIST,his Patients, & LocalHIV-infected People:
3. Relating Human HIV to Simian SIVretroviruses
human immunodeficiency virus 1 (HIV-1), pathogenic
SIVs are not pathogenic in their normal hosts
IMAGE FROM: Medical Art Service, Munich / Wellcome Images.
The structure of HIV
CD4 proteins on surface
Phospholipidmembrane
Matrix
Viral RNA
Viral enzymes:- Reverse transcriptase- Integrase- Protease
Capsid
HIV attaches to CD4receptors on T-Cell
Viral core of enzymes and RNA injected into cell
HIV’s replication cycle
DNA transcribed from viral RNA
Double-stranded DNA produced
DNA integrates with host chromosome
Viral RNA
Viral proteins
New virus assembled
Viral protease cuts up proteins
Transcription
New virus leaves cell
Viral integrase
Retrovirus genomes accumulate mutations relatively quickly • lacks an efficient proofreading, so make errors when it carries out RNA-dependent DNA synthesis.• the molecular clock runs rapidly in retroviruses,
•genomes that diverged quite recently display sufficient nucleotide dissimilarity for a phylogenetic analysis to be carried out.
•In less than 100 years, HIV and SIV genomes contain sufficient data.
•The starting point for this phylogenetic analysis is RNA extracted
from virus particles.RT-PCR
RT-PCRReverse transcription polymerase chain reaction (RT-PCR) is a variant of polymerase chain reaction (PCR). It is a laboratory technique commonly used in molecular biologywhere a RNA strand is reverse transcribed into its DNA complement (complementary DNA, or cDNA) using the enzyme reverse transcriptase, and the resulting cDNA is amplified using PCR.
• This tree has a number of interesting features. First it shows that different samples ofHIV-1 have slightly different sequences, the samples as a whole forming a tight cluster, almost a star-like pattern, that radiates from one end of the unrootedtree.
•*This star-like topology implies
that the global AIDS epidemic
began with a very small number of
viruses, perhaps just one, which have spread and diversified since entering the human population.
• The closest relative to HIV-1 among primates is the SIV of chimpanzees, the implication being that
• this virus jumped across the species barrier between chimps and humans and initiated the AIDS epidemic.
• However, this epidemic did not begin immediately: a relatively long uninterrupted branch links the center of the HIV-1 radiation with the internal node leading to the relevant SIV sequence, suggesting that after transmission to humans, HIV-1 underwent a latent period when it remained restricted to a small part of the global human population, presumably in Africa, before beginning its rapid spread to other parts of the world.
• Other primate SIVs are less closely related to HIV-1, but one, the SIV from sooty mangabey, clusters in the tree with the second human immunodeficiency virus, HIV-2.
• It appears that HIV-2 was transferred to the human population independently of HIV-1, and from a different simian host. HIV-2 is also able to cause AIDS, but has not, as yet, become globally epidemic.
REFERENCES• http://www.bio.davidson.edu/Courses/Molbio/MolStudents/spring2010/Rydbe
rg/Orthologs.html
top related