lecture 29 -polymorphisms in human dna sequences •snps...
TRANSCRIPT
7.03 Fall 2006
1
Lecture 29 - Polymorphisms in Human DNA Sequences
•SNPs•SSRs
Eukaryotic Genes and Genomes
= DNA content of a gamete (sperm or egg)genome = DNA content of a complete haploid set of chromosomes
H. sapiens
M. musculus
D. melanogaster
C. elegans
S. cerevisiae
E. coli
genes/haploid
yearsequencecompleted
DNAcontent/
haploid(Mb)cMChromosomesSpecies
1
16
6
4
20
23
N/A
4000
300
280
1700
3300
5
12
100
180
3000
3000
1997
1997
1998
2000
4,200
5,800
19,000
14,000
30,000?
30,000?
Mb = megabase = 1 million base-pairs of DNA Kb = kilobase = 1 thousand base-pairs of DNA
Note: cM = centi Morgan = 1% recombination
2002 draft
2001 draft2005 finished?
2003 finished
7.03 Fall 2006
2
30003300H. sapiens
30001700M. musculus
180280D. melanogaster
100300C. elegans
124000S. cerevisiae
5N/AE. coli
true breedingstrains?
designcrosses?
generationtime
DNA content/haploid (Mb)cMSpecies
30 min
90 min
4 d
2 wk
3 mo
20 yr
yes yes
yes yes
yes yes
yes yes
yes yes
no no
• Human genetics is retrospective (vs prospective). Human geneticistscannot test hypotheses prospectively. Themouse provides a prospective surrogate.
• Can’t do selections
• Meager amounts of data Human geneticists typically rely upon statisticalarguments as opposed to overwhelmingamounts of data in drawing connections betweengenotype and phenotype.
• Highly dependent on DNA-based maps and DNA-based analysis
The unique advantages of human genetics:
• A large population which is self-screening to a considerable degree• Phenotypic subtlety is not lost on the observer• The self interest of our species
7.03 Fall 2006
3
1) SNPs = single nucleotide polymorphisms = single nucleotide substitutions
Hnuc =
A locus is said to be polymorphic if two or more alleles are each present ata frequency of at least 1% in a populationof animals.
In human populations:
average heterozygosity per nucleotide site = 0.001
7.03 Fall 2006
4
TTT GCT GGC CAC TTT GCT GGA CAC
Phe Ala Gly His
SYNONOMOUS CHANGES
Phe Ala Gly His
TTT GCT GGC CAC TTT GCT TGC CAC
Phe Ala Cys HisPhe Ala Gly His
NON-SYNONOMOUS CHANGES
The great majority (probably 99%) of SNPs are selectively “neutral” changesof little or no functional consequence:
• outside coding or gene regulatory regions (>97% of humangenome)
• silent substitutions in coding sequences
• some amino acid substitutions do not affect protein stability or function
A small minority of SNPs are of functional consequence and areselectively advantageous or disadvantageous.
• disadvantageous SNPs selected against --> further underrepresentation
7.03 Fall 2006
5
Affymetrix chip
7.03 Fall 2006
6
All Tumorous
C57black X
3 Tumors :: 1 non-tumor
TUMORS NON-TUMORS
C57blackAA aa
Aa
7.03 Fall 2006
7
All NON-TUMORS (normal)
C57black AKRX
13/16 NON-TUMORS:: 3/16 tumors
TUMORS NON-TUMORS
All Non-Tumors (normal)
.
C57black AKRX
13/16 non-tumors :: 3/16 tumors
TUMORS NON-TUMORS
AAbb aaBB
AaBb
A-B-aaB-aabb
A-bb
AKR HAS A GENE (B) THAT SUPPRESSES TUMORS
7.03 Fall 2006
8
7.03 Fall 2006
9
7.03 Fall 2006
10
O
OHHO
HO
HOO
LactoseH
!(1,4)-Glycoside Linkage
1
O
OH
HOHO
OH4
H
galactoseresidue
O
OH
HO
HO
HOO
CellobioseH
!(1,4)-Glycoside Linkage
1
O
OH
HOHO
OH4
H
glucoseresidue
glucose residue
CANDIDATE GENE
LACTOSE
The enzyme lactase that is located in the villus enterocytes of the small intestine is responsible for digestion of lactose in milk.
Lactase activity is high and vital during infancy, but in most mammals, including most humans, lactase activity
declines after the weaning phase. In other healthy humans, lactase activity persists at a high level throughout adult life, enabling them to digest lactose as adults. This dominantly
inherited genetic trait is known as lactase persistence. The distribution of these different lactase phenotypes in
human populations is highly variable and is controlled by a polymorphic element cis-acting to the lactase gene. A
putative causal nucleotide change has been identified and occurs on the background of a very extended haplotype
that is frequent in Northern Europeans, where lactase persistence is frequent. This single nucleotide polymorphism is located 14 kb upstream from the start
of transcription of lactase in an intron of the adjacent gene MCM6. This change does not, however, explain all the variation in lactase expression.
7.03 Fall 2006
11
LACTOSE TOLERANCE
LACTASE GENE
SNP
Genotype
2) SSRs = simple sequence repeat polymorphisms = "microsatellites"
Most common type in mammalian genomes is
16F15E14D13C12B11Anallelesprimer #1
primer #2PCRgel electrophoresis
n
CA repeat:
(CA)n
(GT)n
AB CD EF AD CF
FEDCBA
161514131211
7.03 Fall 2006
12
SSRs are extremely useful as genetic markers in human studies because:
• they are easily scored (by PCR)
• they are codominant
• many SSRs exhibit very high average heterozygosities: HSSR = 0.7 to 0.9
• SSRs are abundant
A randomly selected person is likely to be heterozygous.
SSRs occur, on average, about once every in the human(or mouse) genomes. have been identified andmapped within the human genome.
30 kb> 20,000 SSRs
Huntington's disease (HD)
HD:
Phenotype: Loss of neurons personality change, memory loss, motor problem
autosomal dominant affecting 1/20,000 individuals
7.03 Fall 2006
13
20 cM
SSR1 SSR2 SSR3 SSR4 SSR5
genetic linkage mapping
We genotype the six members of the family for SSRs scattered throughoutthe genome (which spans 3300 cM)—perhaps 165 different SSRs distributedat intervals so that20 cM one SSR must be within 10 cM of theHuntington's gene:
SSR37HDPaternal
alleles:
SSR37HD
Genotypes:
We obtain potentially exciting results with SSR37, on chromosome 4:
DCB
A
SSR37
HD/+ HD/+ HD/++/+ +/+ +/+AB AC ADBD BC CD
HD HD++A ABB