positional cloning of the huntington’s disease (hd) gene mapping and cloning of the hd gene...

72
Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing mutations Studies of the HD gene: identifying orthologous proteins (BLAST) mouse knockouts (KO’s) transgenic mice

Upload: cameron-moore

Post on 11-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Positional cloning of the Huntington’s disease (HD) gene

Mapping and cloning of the HD gene

chromosome walking

cDNA libraries

Identifying the disease-causing mutations

Studies of the HD gene:

identifying orthologous proteins

(BLAST)

mouse knockouts (KO’s)

transgenic mice

Summary of other repeat expansion

diseases

Page 2: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Goals for the next three lectures…

-Try to fill in some gaps

-Strengthen the connections between topics

-Some new information:

protein similarity (probably today &

Monday)

knockout mice (probably Monday)

population genetics (Monday?)

-Next Fridays lecture:

no more than 30 minutes of new

material

course evaluations (~15-20 minutes)

review/problem solving/QS10

Page 3: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

If I do spend time reviewing topics on Friday it would be good to know what you need help with:

-No more than 1-2 topics (1-2 sentences)

-Send to: [email protected] to hear from you before

Monday

Lastly: If you feel that an error was made in the grading of your 2nd midterm exam, send an email message to Anne Paul summarizing the error, BY THE END OF THE DAY TODAY.

Solutions to Problem set 6 have been posted on the course website

Page 4: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

(reminder from lecture 13)

- A dominant genetic disease; affects ~ 8 people per/100,000 worldwide

- Symptoms include abnormal body movements (chorea), cognitive decline, death

- Symptoms result from neurodegeneration

- Age of onset typically 40’s; ranges from infancy to elderly

- Genetic anticipation (increasing disease severity in subsequent generations) often observed

- No cure or treatment

Huntington’s DiseaseHuntington’s disease results from nerve cell degeneration in the basal ganglia

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

HD brain Normal brain

Page 5: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Mapping of the Huntington’s disease gene

The informative pedigree:

• 5,000 related individuals from Venezuela segregating HD

• Included 100 members currently affected by HD

• Included >1,000 members with >25% risk

On their 12th probe… the jackpot!

- linkage of the RFLP to HD!

1983

Few markers available so tested random, purified fragments of human genome

Used these random fragments as probes to conduct Southern blot analysis to identify RFLPs

The markers used:

Page 6: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Marker ‘D4S10’ shows linkage to HD

-1

-0.5

0

0.5

1

0 10 20 30 40 50

10

-10

40

30

20

-20

-30

-40

LOD

sco

re (

Z)

Results of linkage studies using the probe “G8” which recognizes the RFLP marker D4S10

Does this result provide significant evidence of linkage?

max = 3cM

3-2

Page 7: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Where is marker ‘D4S10’ located?

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Karyotype:

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

D4S10 (4p16)

centromere

telomere

?

How to tell?

HD gene ~3cM away from D4S10

FISH

~3x106bp

Page 8: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

1983

1992

Page 9: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Narrowing of the HD region

Looking for highly informative recombinants (haplotypes):

D4S10 centromere

telomere

D4S98D4S43

D4S141D4S115D4S111

Y1P18R10

HD ?

D4S141D4S115D4S111

Y1P18R10

D4S98D4S43D4S10

D4S141D4S115D4S111

Y1P18R10

D4S98D4S43D4S10

D4S141D4S115D4S111

Y1P18R10

D4S98D4S43D4S10

1CB1123B

2AC2230A

2C

(B/C)1133A

1BB1135A

2A

(C /2233B

1BB)1135C

1CB2230A

1BB1135A

1C

(B /1130A

2BC)1135C

(1/2)A

(B/C)2233B

Where are the informative recombinants?

HD

HD

derived genotypes

HD gene likely to reside here

HD gene likely to reside here

What Next?

Page 10: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Genetic and physical map of the HD region

D4S10

D4S98

centromere telomere

HD ?

~500kb

D4S180

D4S182

If 2008:D4S180:

AACTGACTTAA

What is the DNA sequence in this

interval?D4S182:

CCTAGCTTAGAT

CCAACTGACTTAAGC…………………….AGCCTAGCTTAGATGC

use in a BLAST search

and here

we know the sequence here

We could also find the genes in this interval using the UCSC

browser But it was 1992…

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

A portion of the UCSC Genome browser window:

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 11: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Genetic and physical map of the HD region

D4S10

D4S98

centromere telomere

HD ?

~500kb

D4S180

D4S182

D4S180:AACTGACTTAA

What is the DNA sequence in this

interval?D4S182:

CCTAGCTTAGAT

and here

we know the sequence here

How was this done in 1992 (i.e., before the genome was

sequenced)?

If 2008:

But it was 1992…

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

A portion of the UCSC Genome browser window:

Page 12: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Chromosome walking (outline)

Make radioactive probes from known sequence

partial digest

Identify D4S180 & D4S182-containing

clones in genomic DNA library

Use ends of those clone’s inserts to find other clones

with overlapping inserts

Repeat

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 13: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Colony hybridization to find the first genomic DNA clone

replica on filter

release the DNA

bind it to filter

X-ray film

which colonies match up with hyb spots?

genomic DNA clones

***

hyb

probe from D4S180 region

Page 14: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Colony hybridization (cont’d)

The colonies you detect must have insert sequences complementary to your D4S180 probe!

What next?

»Pick one of these clones

»Characterize it (restriction digest, etc.)

»Make a probe from one end of its insert

»Repeat colony hybridization

Page 15: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Chromosome walking — finding the next clone

ori end Amp end

Pick one end of the insert

PCR amplify the region

Label the PCR fragment with radioactive tag

The goal — find the colonies (clones) that contain this sequence

overlap your first clone

colony hyb

Page 16: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Colony hybridization (cont’d)

The colonies you detect in the hybridization could have…

How could you tell if they were the same as the original?

- duplicates of your original plasmid

- new plasmids with different (but overlapping) inserts

Restriction digests or sequencing

Page 17: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Assembling a contig

Repeat the process until the clones obtained from the flanking markers join:

probeinsert in original cloneprobe

insert in original clone

a contig

Joining fragments

identified using D4S180 probe

identified using D4S182 probe

HD gene

Page 18: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

STS = sequence tagged site… short, unique genomic sequence—not present anywhere else in the genome— that can be detected by PCR… ID tag for that portion of genome

STS 24 62 17 54 20 9 19 36 4

For example:

Which portion of the genome is represented in this BAC’s insert?

Test the BAC by PCR:

Does it test positive* with PCR primers for STS 24?

Does it test positive with PCR primers for STS62? …etc.

*Test positive? What does that mean?

II. Map location on genomeFrom lecture 13

Page 19: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Genetic and physical map of the HD region

D4S10

D4S98

centromere telomere

HD ?

How do we identify the genes in a contig?Which one is the HD gene?

~500kb

Cosmid (sort of like a plasmid) contig

~40kb each

D4S180

D4S182

Page 20: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Identifying genes in DNA sequence

...TTGAAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATAATAATGGTTTCTTAGACGTCAGGTGGCACTTTTCGGG......AACTTCTGCTTTCCCGGAGCACTATGCGGATAAAAATATCCAATTACAGTACTATTATTACCAAAGAATCTGCAGTCCACCGTGAAAAGCCC...

Various approaches…

Look for signatures of genes—e.g., promoters

Look for transcribed regions—e.g., make a cDNA library

Look for open reading frames

These are things that computers are great at-and some of the things that underlie the UCSC browser

Page 21: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

added by cell during pre-mRNA maturation

Making a cDNA library

cDNA = complementary DNAcomplementary to mRNA

Start with mRNA from a cell culture or tissue

Copy into DNA using reverse transcriptase and poly-A tail

5’AAAAAAA-3’TTTTTTT-5’

insert into plasmid, transform E. coli

One mRNA out of the pool shown here…

Page 22: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Genomic vs. cDNA libraries

cDNA library

make cDNA, insert into plasmid, etc.

• only mRNA regions (exons) represented

• frequency of clone proportional to amount of transcription of the gene

Page 23: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Genetic and physical map of the HD region

D4S10

D4S98

centromere telomere

HD ?

Cosmid (sort of like a plasmid) contig

Used as probes to screen cDNA libraries

IT-15 IT-11 IT-10C3 ADDA

Which (if any) of these transcripts correspond to the HD gene?

~500kb

~40kb each

D4S180

D4S182

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 24: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

How was the HD gene identified?

Compared sequences from normal and HD individualsLook for gene alterations specific to diseased population

Some potential complications:-non-disease causing (rare) polymorphisms distinguishing the diseased and normal population-incomplete penetrance-variable expressivity

Focused on genes that are expressed in the nervous systemScreened cDNA libraries prepared from normal brain mRNA

-Influence of other genes—many traits multigenic-Influence of environment-Observation errors!

Why wouldn’t all individuals of a genotype show the same phenotype?

Page 25: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

How was the HD gene identified?

IT-15 IT-11 IT-10C3 ADDA

HD ?

CAG21

CAG18

Gene: 67 exons; >200 kbmRNA: 10,366 basesProtein: 3,144 aa; ~350kDa

A simple PCR test to measure CAG repeat length in IT-15:

GTCn

CAGnUnique sequences in IT-15 flanking the CAG repeat

Page 26: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

How was the HD gene identified?

65

50

35

20

5

normal HDTriplet repeat number

Further evidence that CAG repeat expansion mutation is the cause of HD:-Two HD patients with a new mutation (not seen in parents) also had a repeat expansion.-Length of repeat correlated with onset and severity.

11-34 CAG repeats in 173 normals(98% between 11-24)

42-86 CAG repeats in >150 HD individualsQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

10090807060504030

100

80

60

40

20

0

On

set

ag

e (

years

)

CAG repeat length

Correlation of HD age of onset and CAG repeat length

Page 27: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

IT-15 is the HD gene (AKA Huntingtin)

IT-15 IT-11 IT-10C3 ADDA

HD ?

CAG11-34Gene: 67 exons; >200 kbmRNA: 10,366 basesProtein: 3,144 aa; ~350kDa

non-disease alleleCAG42-?? disease allele

Page 28: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Why did it take so long to clone the HD gene?1979-work begins to clone HD1983-First marker linked to HD (a lucky break)1993-HD gene cloned

-There were very few markers for linkage studies in humans-There were several inconsistencies in the linkage data-The biology of HD was of limited help in selecting candidate genes (~60% of mRNAs transcribed in the brain)-It is not easy to identify disease causing mutations! "We applaud their discovery," adds another contender, Michael Hayden of the University of British Columbia, who found himself in the painful position of having proposed a different candidate HD gene in Nature the day before the consortium published their proof-positive results in Cell.

-Virginia Morell (1993) Science 260, 28-30.

Page 29: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Repeat instability explains HD genetic anticipation in HD

CAG repeats tend to expand upon paternal transmission:

65

50

35

20

5

Triplet repeat number

90

120

Too young to show trait

Onset in early 40’s

Onset at 2yrs

Expanded CAG repeats are unstable in the paternal germline

Page 30: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Why are long CAG repeats unstable?

A molecular model:

CAGCAGCAGCAGCAGCAGGTCGTCGTCGTCGTCGTCGTC TCGTC

5’

3’

increases CAG repeat length by 1 CAG

G

decreases CAG repeat length by 1 CAG

DNA polymerase

CAGCAGCAGCAGCAGGTCGTCGTCGTCGTCGTC TCGTCGTC

5’

3’

CAG

G

TG C

CAGCAGCAGCAGCAGCAGGTCGTCGTCGTCGTCGTCGTC TC

5’

3’ G

OR, less frequently

Page 31: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Why are only long CAG repeats unstable?

Short repeats often also contain some CAA codons:

CAGCAGCAACAGCAGCAGCAACAGCAAGTCGTCGTTGTCGTCGTCGTTGTCGTC

5’

3’

3’

5’

CAGCAGCAGCAGCAGCAGCAACAGCAAGTCGTCGTCGTCGTCGTCGTTGTCGTC

5’

3’

3’

5’

CAGCAGCAGCAGCAGCAGCAGCAGCAAGTCGTCGTCGTCGTCGTCGTCGTCGTC

5’

3’

3’

5’

Prone to expansion?

Page 32: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

How do mutations in Huntingtin cause disease?

HD is a dominant disorder: Given what you know about dominant mutations, provide possible genetic explanations for the HD phenotype.

-Haploinsufficiency-half the amount of HD gene product insufficient (like W)?

-Dominant negative-poison subunit (like rab27b)?

-Expressed in wrong place (like Antennapedia) or wrong time (like lactase)

-Protein with a new activity (like the ABO blood antigens)?

Page 33: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

How do mutations in Huntingtin cause disease?

HD is a dominant disorder: Given what you know about dominant mutations, provide possible genetic explanations for the HD phenotype.

-Haploinsufficiency-half the amount of HD gene product insufficient (like W)?

-Dominant negative-poison subunit (like rab27b)?

-Expressed in wrong place (like Antennapedia) or wrong time (like lactase)

-Protein with a new activity (like the ABO blood antigens)?

Page 34: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Wolf-Hirschhorn Syndrome (4p-)(The Human “Knockout” of the Huntington Locus )

• Microdeletion (contiguous gene deletion) syndrome

• Growth retardation, with abnormal facies.

• Cardiac, renal, and genital abnormalities.

• Significantly, basal ganglia is intact; no movement disorderRules out haploinsufficiency as cause of Huntington’s disease

Page 35: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

How do mutations in Huntingtin cause disease?

HD is a dominant disorder: Given what you know about dominant mutations, provide possible genetic explanations for the HD phenotype.

-Haploinsufficiency-half the amount of HD gene product insufficient (like W)?

-Dominant negative-poison subunit (like rab27b)?

-Expressed in wrong place (like Antennapedia) or wrong time (like lactase)

-Protein with a new activity (like the ABO blood antigens)?

Page 36: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

How do mutations in Huntingtin cause disease?

HD is a dominant disorder: Given what you know about dominant mutations, provide possible genetic explanations for the HD phenotype.

-Haploinsufficiency-half the amount of HD gene product insufficient (like W)?

-Dominant negative-poison subunit (like rab27b)?-Expressed in wrong place (like Antennapedia) or wrong time (like lactase)

-Protein with a new activity (like the ABO blood antigens)?

Page 37: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Perhaps we can create mutations in the mouse HD gene!But, how do we find the mouse HD gene?

Does CAG expansion act in a dominant-negative fashion?

If the repeat expansion in HD acts in a dominant-negative fashion, a homozygous LoF mutation should be equivalentBut no homozygous LoF alleles

of the HD gene have been seen in humans!

-more mismatches are tolerated if appropriate hybridization conditions are met (salt and temperature). Allows non-identical, but closely-related sequences to hybridize.

okay

Page 38: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Colony hybridization with a human HD probe ultimately led to the identification of the mouse HD gene

Human HD protein 3,144 aaMouse HD protein 3,120 aa

The two proteins match at >90% of their aa’s!

Before continuing, let’s diverge and consider how this is done today-in some detail…(BLAST)

-But we will focus on using BLAST to find similar proteins (unlike what you did in QS)

If the sequences are conserved, the biological function is also likely to be conservedIf the biological function is conserved, we can test whether a mouse bearing a homozygous HD lof mutation resembles the human disease

Page 39: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

1) A sequence database.

2) Some way of saying how similar two sequences are.

3) A really fast way of carrying out the similarity test.

We have the genome sequences and gene structures already.

We’ll diverge from HD for a bit and talk about point 2 now.

Point 3 is more appropriate for a computer course. The method is called BLAST (basic local alignment search tool). You should be at least somewhat familiar with this from QS9.

Finding the mouse HD gene computationally

We need three things:

doubles in size about

every 2 years!

Page 40: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Suppose we have the following aligned protein sequences:

PWAVTASCH|||||||||VYAVQASPH

(human)

(something else)

PWAVTASCH|||||||||PWGVHATCW

(human)

(something else)

We can see that both of the “something else” sequences appear to be related to the human.

But related to what extent? We need to be quantitative.

amino acid identities

amino acid identities

Thinking about protein similarity

Page 41: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Amino acid structures

Hydrophobic

Polar Charged

phenylalanine F

Page 42: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

amino acid one-letter frequency percent

alanine A 0.0768 7.68

cysteine C 0.0162 1.62

aspartate D 0.0526 5.26

glutamate E 0.0648 6.48

phenylalanine F 0.0409 4.09

gylcine G 0.0689 6.89

histidine H 0.0225 2.25

isoleucine I 0.0586 5.86

lysine K 0.0596 5.96

leucine L 0.0958 9.58

methionine M 0.0236 2.36

asparagine N 0.0435 4.35

proline P 0.0490 4.90

glutamine Q 0.0394 3.94

arginine R 0.0521 5.21

serine S 0.0700 7.00

threonine T 0.0558 5.58

valine V 0.0663 6.63

tryptophan W 0.0121 1.21

tyrosine Y 0.0315 3.15

    1.0000 100.00

Amino acid frequencies in the entire universe of known protein sequences.

common

rare

Amino acid frequency

Page 43: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

log odds calculation

score = likelihood of seeing amino acid pair in related protein

likelihood of seeing amino acid pair at randomlog

• Related proteins taken from BLOCKS database (validated related proteins).

• Simply count up how often a particular amino acid pair is seen.

• Gives you the numerator likelihood above.

A-B pair 2 A Bf f f=At random:

(the factor of two is because it can be an A-B pair or a B-A pair)

• Gives the denominator likelihood above.

likelihood of seeing amino acid pair in related protein:

likelihood of seeing amino acid pair at random:

Page 44: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

CKS2_XENLA|Q91879 NIYYSDKYTDEHFEYCKS1_HUMAN|P33551 QIYYSDKYDDEEFEYCKS2_HUMAN|P33552 QIYYSDKYFDEHYEYCKS2_MOUSE|P56390 QIYYSDKYFDEHYEYCKS1_PATVU|P41384 QIYYSDKYFDEDFEYCKS1_DROME|Q24152 DIYYSDKYYDEQFEYCKS1_PHYPO|P55933 TIQYSEKYYDDKFEYCKS1_LEIME|Q25330 KILYSDKYYDDMFEYO23249 QIQYSEKYFDDTFEYO60191 NIHYSTRYSDDTHEYCKS1_SCHPO|P08463 QIHYSPRYADDEYEYCKS1_YEAST|P20486 SIHYSPRYSDDNYEYCKS1_CAEEL|Q17868 DFYYSNKYEDDEFEY

One block from BLOCKS database:

One of 29,068 blocks - pair frequencies compiled from all blocks combined.

Amino acid pair frequencies in related proteins

D-D 21 pairsD-E 14 pairsD-P 14 pairsD-T 7 pairsD-N 7 pairsE-E 1 pairE-T 2 pairsE-P 4 pairsT-P 2 pairsT-T 1 pairT-N 1 pair

LOD calcul. (e.g., D-D pair):

log74 (total # of pairs)

21 (# D-D pairs)

0.05 X 0.05 (f of D-D)From aa frequency table

amino acid one-letter frequency percent

alanine A 0.0768 7.68

cysteine C 0.0162 1.62

aspartate D 0.0526 5.26

glutamate E 0.0648 6.48

phenylalanine F 0.0409 4.09

gylcine G 0.0689 6.89

histidine H 0.0225 2.25

isoleucine I 0.0586 5.86

lysine K 0.0596 5.96

leucine L 0.0958 9.58

methionine M 0.0236 2.36

asparagine N 0.0435 4.35

proline P 0.0490 4.90

glutamine Q 0.0394 3.94

arginine R 0.0521 5.21

serine S 0.0700 7.00

threonine T 0.0558 5.58

valine V 0.0663 6.63

tryptophan W 0.0121 1.21

tyrosine Y 0.0315 3.15

    1.0000 100.00

Page 45: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

log odds scores (side note)

• Traditionally, we use log base 2 (pedigree LOD scores are base 10).

• To make computing fast, scores are usually multiplied by 2 and then rounded to nearest integer (this is a detail).

• Called “half-bit” scores (jargon for taking twice log base 2).

Page 46: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

If amino acid pair seen MORE often than expected at random?

If amino acid pair seen LESS often than expected at random?

odds > 1, score positive

odds < 1, score negative

log odds scores (cont.)

score = likelihood of seeing amino acid pair in related protein

likelihood of seeing amino acid pair at randomlog

Remember: Log2 1 = 0Log2 2 = 1Log2 1/2 = -1

Page 47: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Values from a score matrix (half-bit scores)

one-letter amino acid

code

score for alanine (A) - tryptophan (W)

self match scores

Page 48: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Amino acid structures

alanine A

valine V

glycine G

leucine

isoleucine

methionine M

proline P

L

I

CH CH3

C

N

.

C H H

C

N

.

CH

C

N

C H3

C H3

.

C H C

C

N

C

C H3

C H3

.

C H

C

N C H3

C H3

.

C H C

C

N

C S C H3

.

CH

N

C

.

tryp tophan W C H

C

N

.

HN

.

threonine T

tyrosine Y

serine S

asparagine

glutamine

N

Q

cysteine C

CH

C

N

.OH

.

CH

C

N

.S H

.

CH

C

N

.OH

.

CH

C

N

C OH

.

CH

C

N

.NH2

O.

CH

C

N

.

.NH2

O

lysine K

arginine R

histidine H

aspartate

glutamate

D

E

CH

C

N

.

NN +

.

CH

C

N

NH3+

.

CH

C

N

.NH

NH2+

H2N

.

CH

C

N

C

.O-

O.

CH

C

N

.

.O -

O

.

Hydrophobic

Polar Charged

phenylalanine F

Page 49: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

I-V

L-V

Example - similar amino acids get positive scores

I-L

Qualitatively, what scores do you expect pairs of these to have?

Page 50: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Example - dissimilar amino acids get negative scores

lysine K

arginine R

histidine H

aspartate

glutamate

D

E

CH

C

N

.

NN +

.

CH

C

N

NH3+

.

CH

C

N

.NH

NH2+

H2N

.

CH

C

N

C

.O-

O.

CH

C

N

.

.O -

O

.

vs. charged

hydrophobic

Qualitatively, what scores do you expect pairs among these groups to have?

Page 51: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

PWAVTASCH|||||||||VYAVQASPH

(human)

(something else)

PWAVTASCH|||||||||PWGVHATCW

(human)

(something else)

Related to what extent? We want to be quantitative.Top case: -2 + 2 + 4 + 4 + -1 + 4 + 4 + -3 + 8 = 20

Bottom case: 7 + 11 + 0 + 4 + -2 + 4 + 1 + 9 + -2 = 32

(Side note - this also indicates the odds of seeing a match of this quality by chance for the entire sequence. e.g. bottom match is . Remember they are half-bit scores).

161 2 1/ 65536=

Suppose we have the following aligned protein sequences:

Thinking about protein similarity

Page 52: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Getting back to HD…finding the mouse HD gene

MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQPLLPQPQPPPPPPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPEFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAPRSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQ…

A portion of human HD protein sequence (the “query” sequence):

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

BLASTdatabase of all proteins from human, chimp, dog, mouse, etc.

summary list of all related proteins (one per line)

human

chimp

mouse!

# of expected (E) matches (with a score this good from a

database of this size) from chance alone

Page 53: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Getting back to HD…finding the mouse HD gene

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Bit score

E valueLooking further down on the summary list…

fruit fly

sea anemone

zebra fish

Can keep going, but the validity attenuates as you approach E=1

Page 54: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Getting back to HD…finding the mouse HD gene

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

amino acid dissimilar

my query

Portion of Mus musculus HD alignment:

this matchM.

musculusamino acid

identicalamino acid

similar

gap

Page 55: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

  A C D E F G H I K L M N P Q R S T V W Y

A 4 0 -2 1 -2 0 -2 -1 -1 -1 -1 -2 -1 -1 -1 1 0 0 -3 -2

C 0 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2

D -2 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3

E 1 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2

F -2 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3

G 0 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3

H -2 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2

I -1 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1

K -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 2 -3 -2

L -1 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1

M -1 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1

N -2 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -2

P -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -3

Q -1 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1

R -1 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -2

S 1 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -2

T 0 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -2

V 0 -1 -3 -2 -1 -3 -3 3 2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1

W -3 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 2

Y -2 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 7

score (bits) is sum of each aligned residue (x 0.5 because the score table is in half-bits):

Query LTAVGGIGQLT LT GG+GQLTSbjct LTTPGGLGQLT

4 + 3 + 0 + -2 + 6 + 6 + 2 + 6 + 5 + 4 + 3 = 37 half bits = 18.5 bits

notice that this amino acid pair is poorly conserved

Page 56: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Perhaps we can create mutations in the mouse HD gene!But, how do we find the mouse HD gene?

Does CAG expansion act in a dominant-negative fashion?

If the repeat expansion in HD acts in a dominant-negative fashion, a homozygous LoF mutation should be equivalentBut no homozygous LoF alleles

of the HD gene have been seen in humans!

Can do this using an experimental approach (e.g., screen a library) or using a computational approach (e.g., conduct a BLAST search)

Once the mouse HD gene is identified we must create a recombinant plasmid containing the mouse HD gene and appropriate markers for generating a mouse HD mutation (AKA: a mouse HD “knockout”)

Page 57: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Studies of HD in animal models: a mouse HD KO

Engineering an HD knockout mouse:

H XH X3 4 5 6

ampr ori

H

Partial digest with H & XH XH X3 4 5 6

ampr ori

H 4 5neor

H XH X3 6

ampr ori

H

Partial digest with H

neorgns

Mouse genomic DNA clone bearing HD exons 3-6

cut cut

Restriction endonuclease sites

etc.

Page 58: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Studies of HD in animal models: a mouse HD KO

3 6neorgns

Embryonic stem (ES) cells from an albino (c/c) strain of miceNeomycin

+ gancyclovir 3 6neorgns 3 6neorgns

3 4 5 6 ESgenome

3 6neor 1:1,0003 6neorgns

ES cells die on gancyclovir

Page 59: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Studies of HD in animal models: a mouse HD KO

ES cell bearing heterozygous HD KO

3 4 5 6

1 2 3 4…

Altered splicing results in frameshift and premature transl. term.

Blastocyst-stage embryo from a C/C female

3 6

3 6

1 2 3 6…

neor

Page 60: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Studies of HD in animal models: a mouse HD KO

c/c; HD-/HD+ C/C; HD+/HD+

Place mosaic embryos into surrogate mother

Which of these mosaic offspring are most likely to have the targeted mutation in their germline?

Mosaic embryo

Page 61: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Creating the homozygous KO

Mosaic mouse:c/c; HD-/HD+

C/C; HD+/HD+

Albino mouse:c/c; HD+/HD+ X

c/c; HD-/HD+

ORc/c; HD+/HD+

c/c; HD-/HD+ c/c; HD+/HD+ X

c/c; HD-/HD+ c/c; HD-/HD+

C/c; HD+/HD+

X

c/c; HD-/HD- The homozygous KO!

genotype (southern blot of blood sample)

Page 62: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

The phenotypes of the HD KO mice…

c/c; HD-/HD+

c/c; HD-/HD-

Phenotypically normal-no brain pathology

Early embryonic lethal-embryonic developmental abnormalitiesThe homozygous HD KO displays different

symptoms than the human diseaseHD symptoms do not result from a lof of the HD gene

Page 63: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

How do mutations in Huntingtin cause disease?

HD is a dominant disorder: Given what you know about dominant mutations, provide possible genetic explanations for the HD phenotype.

-Haploinsufficiency-half the amount of HD gene product insufficient (like W)?

-Dominant negative-poison subunit (like rab27b)?

-Expressed in wrong place (like Antennapedia) or wrong time (like lactase)

-Protein with a new activity (like the ABO blood antigens)? What could it be?

Page 64: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

The HD CAG repeats encode polyglutamine (polyQ) tracts

3 4 5 61 2 3Promoter

Exons(CAG)n

etc.

AAAAAA

AUG…(CAG)n

M…(Q)n…

Are proteins bearing long polyglutamine tracts toxic?

Page 65: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Are long polyglutamine (polyQ) tracts toxic?

Evidence in favor:-Spinal and Bulbar muscular atrophy

caused by polyQ expansion of androgen receptor.

-proteins with long polyQ repeats fold abnormally…QQQQQQQQQQQQQQQQQQQQQQQ...

Protein product = misfolded conformation

When length of glutamine tract exceeds a certain length threshold (~ 35), the polyglutamine tract adopts an abnormal conformation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 66: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Creating a mouse with a human HD geneCreation of a ‘transgenic mouse’

3 4 5 61 2 3Promoter

Exons(CAG)180

etc.

Human HD gene

Single-celled mouse embryo

Gene fragment inserts randomly into mouse genome

Place embryo into surrogate mother

1Promoter

Page 67: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Creating a transgenic mouse (contd)

Surrogate

mother

Transgenic offspring?Can be easily tested using

PCRPhenotypes of HD transgenic mice:

-tremors, abnormal gait, learning deficits by 6mos.-brain polyQ aggregates-cell loss in basal ganglia in late stages

Confirms protein with new activity (GoF) mechanismSuggests that polyglutamine expansion is toxic

Page 68: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Are polyQ expansions toxic in a novel context?

Insertion of a polyQ tract in the hypoxanthine phosphoribosyltransferase (HPRT) gene

(HPRT)1 2 3 4

(CAG)146

Generate transgenic mouse

Do mice develop HD-like pathology?

Phenotypes of HPRT transgenic mice closely resemble the HD transgenic mice.

suggests that polyQ itself is primarily responsible for toxicity

polyQ expression is also toxic in flies, yeast, cell lines, etc.

Page 69: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

What have we learned from cloning HD?- Symptoms result from neurodegeneration

- Age of onset typically 40’s; ranges from infancy to elderly

- Genetic anticipation (increasing disease severity in subsequent generations) often observed

- No cure or treatmentAge of onset correlates with CAG repeat length; can now be predicted (not clear if this is good or bad)Genetic anticipation results from repeat length instability, primarily in paternal germline

Mechanism of neuron death involves intrinsic toxicity of large polyQ tracts

But there are several promising strategies on the horizon

Page 70: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

1993

Present

Page 71: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing

Repeat Expansion Repeat Expansion DiseasesDiseases

Fragile X syndrome of mental retardation

FRAXE mental retardation

X-linked spinal and bulbar muscular atrophy

Myotonic dystrophy 1 and 2

Huntington’s disease 1 and 2

Dentatorubral pallidoluysian atrophy

Friedreich’s ataxia

Oculopharyngeal muscular dystrophy

Myoclonic epilepsy of Unverricht-Lundborg

Spinocerebellar ataxia types 1, 2, 3, 6, 7, 8, 10,

12 & 17

Page 72: Positional cloning of the Huntington’s disease (HD) gene Mapping and cloning of the HD gene chromosome walking cDNA libraries Identifying the disease-causing