positional cloning of the huntington’s disease (hd) gene mapping and cloning of the hd gene...
TRANSCRIPT
Positional cloning of the Huntington’s disease (HD) gene
Mapping and cloning of the HD gene
chromosome walking
cDNA libraries
Identifying the disease-causing mutations
Studies of the HD gene:
identifying orthologous proteins
(BLAST)
mouse knockouts (KO’s)
transgenic mice
Summary of other repeat expansion
diseases
Goals for the next three lectures…
-Try to fill in some gaps
-Strengthen the connections between topics
-Some new information:
protein similarity (probably today &
Monday)
knockout mice (probably Monday)
population genetics (Monday?)
-Next Fridays lecture:
no more than 30 minutes of new
material
course evaluations (~15-20 minutes)
review/problem solving/QS10
If I do spend time reviewing topics on Friday it would be good to know what you need help with:
-No more than 1-2 topics (1-2 sentences)
-Send to: [email protected] to hear from you before
Monday
Lastly: If you feel that an error was made in the grading of your 2nd midterm exam, send an email message to Anne Paul summarizing the error, BY THE END OF THE DAY TODAY.
Solutions to Problem set 6 have been posted on the course website
(reminder from lecture 13)
- A dominant genetic disease; affects ~ 8 people per/100,000 worldwide
- Symptoms include abnormal body movements (chorea), cognitive decline, death
- Symptoms result from neurodegeneration
- Age of onset typically 40’s; ranges from infancy to elderly
- Genetic anticipation (increasing disease severity in subsequent generations) often observed
- No cure or treatment
Huntington’s DiseaseHuntington’s disease results from nerve cell degeneration in the basal ganglia
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
HD brain Normal brain
Mapping of the Huntington’s disease gene
The informative pedigree:
• 5,000 related individuals from Venezuela segregating HD
• Included 100 members currently affected by HD
• Included >1,000 members with >25% risk
On their 12th probe… the jackpot!
- linkage of the RFLP to HD!
1983
Few markers available so tested random, purified fragments of human genome
Used these random fragments as probes to conduct Southern blot analysis to identify RFLPs
The markers used:
Marker ‘D4S10’ shows linkage to HD
-1
-0.5
0
0.5
1
0 10 20 30 40 50
10
-10
40
30
20
-20
-30
-40
LOD
sco
re (
Z)
Results of linkage studies using the probe “G8” which recognizes the RFLP marker D4S10
Does this result provide significant evidence of linkage?
max = 3cM
3-2
Where is marker ‘D4S10’ located?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Karyotype:
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
D4S10 (4p16)
centromere
telomere
?
How to tell?
HD gene ~3cM away from D4S10
FISH
~3x106bp
1983
1992
Narrowing of the HD region
Looking for highly informative recombinants (haplotypes):
D4S10 centromere
telomere
D4S98D4S43
D4S141D4S115D4S111
Y1P18R10
HD ?
D4S141D4S115D4S111
Y1P18R10
D4S98D4S43D4S10
D4S141D4S115D4S111
Y1P18R10
D4S98D4S43D4S10
D4S141D4S115D4S111
Y1P18R10
D4S98D4S43D4S10
1CB1123B
2AC2230A
2C
(B/C)1133A
1BB1135A
2A
(C /2233B
1BB)1135C
1CB2230A
1BB1135A
1C
(B /1130A
2BC)1135C
(1/2)A
(B/C)2233B
Where are the informative recombinants?
HD
HD
derived genotypes
HD gene likely to reside here
HD gene likely to reside here
What Next?
Genetic and physical map of the HD region
D4S10
D4S98
centromere telomere
HD ?
~500kb
D4S180
D4S182
If 2008:D4S180:
AACTGACTTAA
What is the DNA sequence in this
interval?D4S182:
CCTAGCTTAGAT
CCAACTGACTTAAGC…………………….AGCCTAGCTTAGATGC
use in a BLAST search
and here
we know the sequence here
We could also find the genes in this interval using the UCSC
browser But it was 1992…
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
A portion of the UCSC Genome browser window:
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Genetic and physical map of the HD region
D4S10
D4S98
centromere telomere
HD ?
~500kb
D4S180
D4S182
D4S180:AACTGACTTAA
What is the DNA sequence in this
interval?D4S182:
CCTAGCTTAGAT
and here
we know the sequence here
How was this done in 1992 (i.e., before the genome was
sequenced)?
If 2008:
But it was 1992…
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
A portion of the UCSC Genome browser window:
Chromosome walking (outline)
Make radioactive probes from known sequence
partial digest
Identify D4S180 & D4S182-containing
clones in genomic DNA library
Use ends of those clone’s inserts to find other clones
with overlapping inserts
Repeat
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Colony hybridization to find the first genomic DNA clone
replica on filter
release the DNA
bind it to filter
X-ray film
which colonies match up with hyb spots?
genomic DNA clones
***
hyb
probe from D4S180 region
Colony hybridization (cont’d)
The colonies you detect must have insert sequences complementary to your D4S180 probe!
What next?
»Pick one of these clones
»Characterize it (restriction digest, etc.)
»Make a probe from one end of its insert
»Repeat colony hybridization
Chromosome walking — finding the next clone
ori end Amp end
Pick one end of the insert
PCR amplify the region
Label the PCR fragment with radioactive tag
The goal — find the colonies (clones) that contain this sequence
overlap your first clone
colony hyb
Colony hybridization (cont’d)
The colonies you detect in the hybridization could have…
How could you tell if they were the same as the original?
- duplicates of your original plasmid
- new plasmids with different (but overlapping) inserts
Restriction digests or sequencing
Assembling a contig
Repeat the process until the clones obtained from the flanking markers join:
probeinsert in original cloneprobe
insert in original clone
a contig
Joining fragments
identified using D4S180 probe
identified using D4S182 probe
HD gene
STS = sequence tagged site… short, unique genomic sequence—not present anywhere else in the genome— that can be detected by PCR… ID tag for that portion of genome
STS 24 62 17 54 20 9 19 36 4
For example:
Which portion of the genome is represented in this BAC’s insert?
Test the BAC by PCR:
Does it test positive* with PCR primers for STS 24?
Does it test positive with PCR primers for STS62? …etc.
*Test positive? What does that mean?
II. Map location on genomeFrom lecture 13
Genetic and physical map of the HD region
D4S10
D4S98
centromere telomere
HD ?
How do we identify the genes in a contig?Which one is the HD gene?
~500kb
Cosmid (sort of like a plasmid) contig
~40kb each
D4S180
D4S182
Identifying genes in DNA sequence
...TTGAAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATAATAATGGTTTCTTAGACGTCAGGTGGCACTTTTCGGG......AACTTCTGCTTTCCCGGAGCACTATGCGGATAAAAATATCCAATTACAGTACTATTATTACCAAAGAATCTGCAGTCCACCGTGAAAAGCCC...
Various approaches…
Look for signatures of genes—e.g., promoters
Look for transcribed regions—e.g., make a cDNA library
Look for open reading frames
These are things that computers are great at-and some of the things that underlie the UCSC browser
added by cell during pre-mRNA maturation
Making a cDNA library
cDNA = complementary DNAcomplementary to mRNA
Start with mRNA from a cell culture or tissue
Copy into DNA using reverse transcriptase and poly-A tail
5’AAAAAAA-3’TTTTTTT-5’
insert into plasmid, transform E. coli
One mRNA out of the pool shown here…
Genomic vs. cDNA libraries
cDNA library
make cDNA, insert into plasmid, etc.
• only mRNA regions (exons) represented
• frequency of clone proportional to amount of transcription of the gene
Genetic and physical map of the HD region
D4S10
D4S98
centromere telomere
HD ?
Cosmid (sort of like a plasmid) contig
Used as probes to screen cDNA libraries
IT-15 IT-11 IT-10C3 ADDA
Which (if any) of these transcripts correspond to the HD gene?
~500kb
~40kb each
D4S180
D4S182
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
How was the HD gene identified?
Compared sequences from normal and HD individualsLook for gene alterations specific to diseased population
Some potential complications:-non-disease causing (rare) polymorphisms distinguishing the diseased and normal population-incomplete penetrance-variable expressivity
Focused on genes that are expressed in the nervous systemScreened cDNA libraries prepared from normal brain mRNA
-Influence of other genes—many traits multigenic-Influence of environment-Observation errors!
Why wouldn’t all individuals of a genotype show the same phenotype?
How was the HD gene identified?
IT-15 IT-11 IT-10C3 ADDA
HD ?
CAG21
CAG18
Gene: 67 exons; >200 kbmRNA: 10,366 basesProtein: 3,144 aa; ~350kDa
A simple PCR test to measure CAG repeat length in IT-15:
GTCn
CAGnUnique sequences in IT-15 flanking the CAG repeat
How was the HD gene identified?
65
50
35
20
5
normal HDTriplet repeat number
Further evidence that CAG repeat expansion mutation is the cause of HD:-Two HD patients with a new mutation (not seen in parents) also had a repeat expansion.-Length of repeat correlated with onset and severity.
11-34 CAG repeats in 173 normals(98% between 11-24)
42-86 CAG repeats in >150 HD individualsQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
10090807060504030
100
80
60
40
20
0
On
set
ag
e (
years
)
CAG repeat length
Correlation of HD age of onset and CAG repeat length
IT-15 is the HD gene (AKA Huntingtin)
IT-15 IT-11 IT-10C3 ADDA
HD ?
CAG11-34Gene: 67 exons; >200 kbmRNA: 10,366 basesProtein: 3,144 aa; ~350kDa
non-disease alleleCAG42-?? disease allele
Why did it take so long to clone the HD gene?1979-work begins to clone HD1983-First marker linked to HD (a lucky break)1993-HD gene cloned
-There were very few markers for linkage studies in humans-There were several inconsistencies in the linkage data-The biology of HD was of limited help in selecting candidate genes (~60% of mRNAs transcribed in the brain)-It is not easy to identify disease causing mutations! "We applaud their discovery," adds another contender, Michael Hayden of the University of British Columbia, who found himself in the painful position of having proposed a different candidate HD gene in Nature the day before the consortium published their proof-positive results in Cell.
-Virginia Morell (1993) Science 260, 28-30.
Repeat instability explains HD genetic anticipation in HD
CAG repeats tend to expand upon paternal transmission:
65
50
35
20
5
Triplet repeat number
90
120
Too young to show trait
Onset in early 40’s
Onset at 2yrs
Expanded CAG repeats are unstable in the paternal germline
Why are long CAG repeats unstable?
A molecular model:
CAGCAGCAGCAGCAGCAGGTCGTCGTCGTCGTCGTCGTC TCGTC
5’
3’
increases CAG repeat length by 1 CAG
G
decreases CAG repeat length by 1 CAG
DNA polymerase
CAGCAGCAGCAGCAGGTCGTCGTCGTCGTCGTC TCGTCGTC
5’
3’
CAG
G
TG C
CAGCAGCAGCAGCAGCAGGTCGTCGTCGTCGTCGTCGTC TC
5’
3’ G
OR, less frequently
Why are only long CAG repeats unstable?
Short repeats often also contain some CAA codons:
CAGCAGCAACAGCAGCAGCAACAGCAAGTCGTCGTTGTCGTCGTCGTTGTCGTC
5’
3’
3’
5’
CAGCAGCAGCAGCAGCAGCAACAGCAAGTCGTCGTCGTCGTCGTCGTTGTCGTC
5’
3’
3’
5’
CAGCAGCAGCAGCAGCAGCAGCAGCAAGTCGTCGTCGTCGTCGTCGTCGTCGTC
5’
3’
3’
5’
Prone to expansion?
How do mutations in Huntingtin cause disease?
HD is a dominant disorder: Given what you know about dominant mutations, provide possible genetic explanations for the HD phenotype.
-Haploinsufficiency-half the amount of HD gene product insufficient (like W)?
-Dominant negative-poison subunit (like rab27b)?
-Expressed in wrong place (like Antennapedia) or wrong time (like lactase)
-Protein with a new activity (like the ABO blood antigens)?
How do mutations in Huntingtin cause disease?
HD is a dominant disorder: Given what you know about dominant mutations, provide possible genetic explanations for the HD phenotype.
-Haploinsufficiency-half the amount of HD gene product insufficient (like W)?
-Dominant negative-poison subunit (like rab27b)?
-Expressed in wrong place (like Antennapedia) or wrong time (like lactase)
-Protein with a new activity (like the ABO blood antigens)?
Wolf-Hirschhorn Syndrome (4p-)(The Human “Knockout” of the Huntington Locus )
• Microdeletion (contiguous gene deletion) syndrome
• Growth retardation, with abnormal facies.
• Cardiac, renal, and genital abnormalities.
• Significantly, basal ganglia is intact; no movement disorderRules out haploinsufficiency as cause of Huntington’s disease
How do mutations in Huntingtin cause disease?
HD is a dominant disorder: Given what you know about dominant mutations, provide possible genetic explanations for the HD phenotype.
-Haploinsufficiency-half the amount of HD gene product insufficient (like W)?
-Dominant negative-poison subunit (like rab27b)?
-Expressed in wrong place (like Antennapedia) or wrong time (like lactase)
-Protein with a new activity (like the ABO blood antigens)?
How do mutations in Huntingtin cause disease?
HD is a dominant disorder: Given what you know about dominant mutations, provide possible genetic explanations for the HD phenotype.
-Haploinsufficiency-half the amount of HD gene product insufficient (like W)?
-Dominant negative-poison subunit (like rab27b)?-Expressed in wrong place (like Antennapedia) or wrong time (like lactase)
-Protein with a new activity (like the ABO blood antigens)?
Perhaps we can create mutations in the mouse HD gene!But, how do we find the mouse HD gene?
Does CAG expansion act in a dominant-negative fashion?
If the repeat expansion in HD acts in a dominant-negative fashion, a homozygous LoF mutation should be equivalentBut no homozygous LoF alleles
of the HD gene have been seen in humans!
-more mismatches are tolerated if appropriate hybridization conditions are met (salt and temperature). Allows non-identical, but closely-related sequences to hybridize.
okay
Colony hybridization with a human HD probe ultimately led to the identification of the mouse HD gene
Human HD protein 3,144 aaMouse HD protein 3,120 aa
The two proteins match at >90% of their aa’s!
Before continuing, let’s diverge and consider how this is done today-in some detail…(BLAST)
-But we will focus on using BLAST to find similar proteins (unlike what you did in QS)
If the sequences are conserved, the biological function is also likely to be conservedIf the biological function is conserved, we can test whether a mouse bearing a homozygous HD lof mutation resembles the human disease
1) A sequence database.
2) Some way of saying how similar two sequences are.
3) A really fast way of carrying out the similarity test.
We have the genome sequences and gene structures already.
We’ll diverge from HD for a bit and talk about point 2 now.
Point 3 is more appropriate for a computer course. The method is called BLAST (basic local alignment search tool). You should be at least somewhat familiar with this from QS9.
Finding the mouse HD gene computationally
We need three things:
doubles in size about
every 2 years!
Suppose we have the following aligned protein sequences:
PWAVTASCH|||||||||VYAVQASPH
(human)
(something else)
PWAVTASCH|||||||||PWGVHATCW
(human)
(something else)
We can see that both of the “something else” sequences appear to be related to the human.
But related to what extent? We need to be quantitative.
amino acid identities
amino acid identities
Thinking about protein similarity
Amino acid structures
Hydrophobic
Polar Charged
phenylalanine F
amino acid one-letter frequency percent
alanine A 0.0768 7.68
cysteine C 0.0162 1.62
aspartate D 0.0526 5.26
glutamate E 0.0648 6.48
phenylalanine F 0.0409 4.09
gylcine G 0.0689 6.89
histidine H 0.0225 2.25
isoleucine I 0.0586 5.86
lysine K 0.0596 5.96
leucine L 0.0958 9.58
methionine M 0.0236 2.36
asparagine N 0.0435 4.35
proline P 0.0490 4.90
glutamine Q 0.0394 3.94
arginine R 0.0521 5.21
serine S 0.0700 7.00
threonine T 0.0558 5.58
valine V 0.0663 6.63
tryptophan W 0.0121 1.21
tyrosine Y 0.0315 3.15
1.0000 100.00
Amino acid frequencies in the entire universe of known protein sequences.
common
rare
Amino acid frequency
log odds calculation
score = likelihood of seeing amino acid pair in related protein
likelihood of seeing amino acid pair at randomlog
• Related proteins taken from BLOCKS database (validated related proteins).
• Simply count up how often a particular amino acid pair is seen.
• Gives you the numerator likelihood above.
A-B pair 2 A Bf f f=At random:
(the factor of two is because it can be an A-B pair or a B-A pair)
• Gives the denominator likelihood above.
likelihood of seeing amino acid pair in related protein:
likelihood of seeing amino acid pair at random:
CKS2_XENLA|Q91879 NIYYSDKYTDEHFEYCKS1_HUMAN|P33551 QIYYSDKYDDEEFEYCKS2_HUMAN|P33552 QIYYSDKYFDEHYEYCKS2_MOUSE|P56390 QIYYSDKYFDEHYEYCKS1_PATVU|P41384 QIYYSDKYFDEDFEYCKS1_DROME|Q24152 DIYYSDKYYDEQFEYCKS1_PHYPO|P55933 TIQYSEKYYDDKFEYCKS1_LEIME|Q25330 KILYSDKYYDDMFEYO23249 QIQYSEKYFDDTFEYO60191 NIHYSTRYSDDTHEYCKS1_SCHPO|P08463 QIHYSPRYADDEYEYCKS1_YEAST|P20486 SIHYSPRYSDDNYEYCKS1_CAEEL|Q17868 DFYYSNKYEDDEFEY
One block from BLOCKS database:
One of 29,068 blocks - pair frequencies compiled from all blocks combined.
Amino acid pair frequencies in related proteins
D-D 21 pairsD-E 14 pairsD-P 14 pairsD-T 7 pairsD-N 7 pairsE-E 1 pairE-T 2 pairsE-P 4 pairsT-P 2 pairsT-T 1 pairT-N 1 pair
LOD calcul. (e.g., D-D pair):
log74 (total # of pairs)
21 (# D-D pairs)
0.05 X 0.05 (f of D-D)From aa frequency table
amino acid one-letter frequency percent
alanine A 0.0768 7.68
cysteine C 0.0162 1.62
aspartate D 0.0526 5.26
glutamate E 0.0648 6.48
phenylalanine F 0.0409 4.09
gylcine G 0.0689 6.89
histidine H 0.0225 2.25
isoleucine I 0.0586 5.86
lysine K 0.0596 5.96
leucine L 0.0958 9.58
methionine M 0.0236 2.36
asparagine N 0.0435 4.35
proline P 0.0490 4.90
glutamine Q 0.0394 3.94
arginine R 0.0521 5.21
serine S 0.0700 7.00
threonine T 0.0558 5.58
valine V 0.0663 6.63
tryptophan W 0.0121 1.21
tyrosine Y 0.0315 3.15
1.0000 100.00
log odds scores (side note)
• Traditionally, we use log base 2 (pedigree LOD scores are base 10).
• To make computing fast, scores are usually multiplied by 2 and then rounded to nearest integer (this is a detail).
• Called “half-bit” scores (jargon for taking twice log base 2).
If amino acid pair seen MORE often than expected at random?
If amino acid pair seen LESS often than expected at random?
odds > 1, score positive
odds < 1, score negative
log odds scores (cont.)
score = likelihood of seeing amino acid pair in related protein
likelihood of seeing amino acid pair at randomlog
Remember: Log2 1 = 0Log2 2 = 1Log2 1/2 = -1
Values from a score matrix (half-bit scores)
one-letter amino acid
code
score for alanine (A) - tryptophan (W)
self match scores
Amino acid structures
alanine A
valine V
glycine G
leucine
isoleucine
methionine M
proline P
L
I
CH CH3
C
N
.
C H H
C
N
.
CH
C
N
C H3
C H3
.
C H C
C
N
C
C H3
C H3
.
C H
C
N C H3
C H3
.
C H C
C
N
C S C H3
.
CH
N
C
.
tryp tophan W C H
C
N
.
HN
.
threonine T
tyrosine Y
serine S
asparagine
glutamine
N
Q
cysteine C
CH
C
N
.OH
.
CH
C
N
.S H
.
CH
C
N
.OH
.
CH
C
N
C OH
.
CH
C
N
.NH2
O.
CH
C
N
.
.NH2
O
lysine K
arginine R
histidine H
aspartate
glutamate
D
E
CH
C
N
.
NN +
.
CH
C
N
NH3+
.
CH
C
N
.NH
NH2+
H2N
.
CH
C
N
C
.O-
O.
CH
C
N
.
.O -
O
.
Hydrophobic
Polar Charged
phenylalanine F
I-V
L-V
Example - similar amino acids get positive scores
I-L
Qualitatively, what scores do you expect pairs of these to have?
Example - dissimilar amino acids get negative scores
lysine K
arginine R
histidine H
aspartate
glutamate
D
E
CH
C
N
.
NN +
.
CH
C
N
NH3+
.
CH
C
N
.NH
NH2+
H2N
.
CH
C
N
C
.O-
O.
CH
C
N
.
.O -
O
.
vs. charged
hydrophobic
Qualitatively, what scores do you expect pairs among these groups to have?
PWAVTASCH|||||||||VYAVQASPH
(human)
(something else)
PWAVTASCH|||||||||PWGVHATCW
(human)
(something else)
Related to what extent? We want to be quantitative.Top case: -2 + 2 + 4 + 4 + -1 + 4 + 4 + -3 + 8 = 20
Bottom case: 7 + 11 + 0 + 4 + -2 + 4 + 1 + 9 + -2 = 32
(Side note - this also indicates the odds of seeing a match of this quality by chance for the entire sequence. e.g. bottom match is . Remember they are half-bit scores).
161 2 1/ 65536=
Suppose we have the following aligned protein sequences:
Thinking about protein similarity
Getting back to HD…finding the mouse HD gene
MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQPLLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPEFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAPRSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQ…
A portion of human HD protein sequence (the “query” sequence):
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
BLASTdatabase of all proteins from human, chimp, dog, mouse, etc.
summary list of all related proteins (one per line)
human
chimp
mouse!
# of expected (E) matches (with a score this good from a
database of this size) from chance alone
Getting back to HD…finding the mouse HD gene
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Bit score
E valueLooking further down on the summary list…
fruit fly
sea anemone
zebra fish
Can keep going, but the validity attenuates as you approach E=1
Getting back to HD…finding the mouse HD gene
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
amino acid dissimilar
my query
Portion of Mus musculus HD alignment:
this matchM.
musculusamino acid
identicalamino acid
similar
gap
A C D E F G H I K L M N P Q R S T V W Y
A 4 0 -2 1 -2 0 -2 -1 -1 -1 -1 -2 -1 -1 -1 1 0 0 -3 -2
C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2
D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3
E 1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2
F -2 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3
G 0 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3
H -2 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2
I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1
K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 2 -3 -2
L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1
M -1 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1
N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2
P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3
Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1
R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2
S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2
T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2
V 0 -1 -3 -2 -1 -3 -3 3 2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1
W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2
Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7
score (bits) is sum of each aligned residue (x 0.5 because the score table is in half-bits):
Query LTAVGGIGQLT LT GG+GQLTSbjct LTTPGGLGQLT
4 + 3 + 0 + -2 + 6 + 6 + 2 + 6 + 5 + 4 + 3 = 37 half bits = 18.5 bits
notice that this amino acid pair is poorly conserved
Perhaps we can create mutations in the mouse HD gene!But, how do we find the mouse HD gene?
Does CAG expansion act in a dominant-negative fashion?
If the repeat expansion in HD acts in a dominant-negative fashion, a homozygous LoF mutation should be equivalentBut no homozygous LoF alleles
of the HD gene have been seen in humans!
Can do this using an experimental approach (e.g., screen a library) or using a computational approach (e.g., conduct a BLAST search)
Once the mouse HD gene is identified we must create a recombinant plasmid containing the mouse HD gene and appropriate markers for generating a mouse HD mutation (AKA: a mouse HD “knockout”)
Studies of HD in animal models: a mouse HD KO
Engineering an HD knockout mouse:
H XH X3 4 5 6
ampr ori
H
Partial digest with H & XH XH X3 4 5 6
ampr ori
H 4 5neor
H XH X3 6
ampr ori
H
Partial digest with H
neorgns
Mouse genomic DNA clone bearing HD exons 3-6
cut cut
Restriction endonuclease sites
etc.
Studies of HD in animal models: a mouse HD KO
3 6neorgns
Embryonic stem (ES) cells from an albino (c/c) strain of miceNeomycin
+ gancyclovir 3 6neorgns 3 6neorgns
3 4 5 6 ESgenome
3 6neor 1:1,0003 6neorgns
ES cells die on gancyclovir
Studies of HD in animal models: a mouse HD KO
ES cell bearing heterozygous HD KO
3 4 5 6
1 2 3 4…
Altered splicing results in frameshift and premature transl. term.
Blastocyst-stage embryo from a C/C female
3 6
3 6
1 2 3 6…
neor
Studies of HD in animal models: a mouse HD KO
c/c; HD-/HD+ C/C; HD+/HD+
Place mosaic embryos into surrogate mother
Which of these mosaic offspring are most likely to have the targeted mutation in their germline?
Mosaic embryo
Creating the homozygous KO
Mosaic mouse:c/c; HD-/HD+
C/C; HD+/HD+
Albino mouse:c/c; HD+/HD+ X
c/c; HD-/HD+
ORc/c; HD+/HD+
c/c; HD-/HD+ c/c; HD+/HD+ X
c/c; HD-/HD+ c/c; HD-/HD+
C/c; HD+/HD+
X
c/c; HD-/HD- The homozygous KO!
genotype (southern blot of blood sample)
The phenotypes of the HD KO mice…
c/c; HD-/HD+
c/c; HD-/HD-
Phenotypically normal-no brain pathology
Early embryonic lethal-embryonic developmental abnormalitiesThe homozygous HD KO displays different
symptoms than the human diseaseHD symptoms do not result from a lof of the HD gene
How do mutations in Huntingtin cause disease?
HD is a dominant disorder: Given what you know about dominant mutations, provide possible genetic explanations for the HD phenotype.
-Haploinsufficiency-half the amount of HD gene product insufficient (like W)?
-Dominant negative-poison subunit (like rab27b)?
-Expressed in wrong place (like Antennapedia) or wrong time (like lactase)
-Protein with a new activity (like the ABO blood antigens)? What could it be?
The HD CAG repeats encode polyglutamine (polyQ) tracts
3 4 5 61 2 3Promoter
Exons(CAG)n
etc.
AAAAAA
AUG…(CAG)n
M…(Q)n…
Are proteins bearing long polyglutamine tracts toxic?
Are long polyglutamine (polyQ) tracts toxic?
Evidence in favor:-Spinal and Bulbar muscular atrophy
caused by polyQ expansion of androgen receptor.
-proteins with long polyQ repeats fold abnormally…QQQQQQQQQQQQQQQQQQQQQQQ...
Protein product = misfolded conformation
When length of glutamine tract exceeds a certain length threshold (~ 35), the polyglutamine tract adopts an abnormal conformation
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Creating a mouse with a human HD geneCreation of a ‘transgenic mouse’
3 4 5 61 2 3Promoter
Exons(CAG)180
etc.
Human HD gene
Single-celled mouse embryo
Gene fragment inserts randomly into mouse genome
Place embryo into surrogate mother
1Promoter
Creating a transgenic mouse (contd)
Surrogate
mother
Transgenic offspring?Can be easily tested using
PCRPhenotypes of HD transgenic mice:
-tremors, abnormal gait, learning deficits by 6mos.-brain polyQ aggregates-cell loss in basal ganglia in late stages
Confirms protein with new activity (GoF) mechanismSuggests that polyglutamine expansion is toxic
Are polyQ expansions toxic in a novel context?
Insertion of a polyQ tract in the hypoxanthine phosphoribosyltransferase (HPRT) gene
(HPRT)1 2 3 4
(CAG)146
Generate transgenic mouse
Do mice develop HD-like pathology?
Phenotypes of HPRT transgenic mice closely resemble the HD transgenic mice.
suggests that polyQ itself is primarily responsible for toxicity
polyQ expression is also toxic in flies, yeast, cell lines, etc.
What have we learned from cloning HD?- Symptoms result from neurodegeneration
- Age of onset typically 40’s; ranges from infancy to elderly
- Genetic anticipation (increasing disease severity in subsequent generations) often observed
- No cure or treatmentAge of onset correlates with CAG repeat length; can now be predicted (not clear if this is good or bad)Genetic anticipation results from repeat length instability, primarily in paternal germline
Mechanism of neuron death involves intrinsic toxicity of large polyQ tracts
But there are several promising strategies on the horizon
1993
Present
Repeat Expansion Repeat Expansion DiseasesDiseases
Fragile X syndrome of mental retardation
FRAXE mental retardation
X-linked spinal and bulbar muscular atrophy
Myotonic dystrophy 1 and 2
Huntington’s disease 1 and 2
Dentatorubral pallidoluysian atrophy
Friedreich’s ataxia
Oculopharyngeal muscular dystrophy
Myoclonic epilepsy of Unverricht-Lundborg
Spinocerebellar ataxia types 1, 2, 3, 6, 7, 8, 10,
12 & 17