detection of regulatory variation in mouse genes

6
letter 432 nature genetics • volume 32 • november 2002 Detection of regulatory variation in mouse genes Christopher R. Cowles 1 , Joel N. Hirschhorn 1–3 , David Altshuler 1,2,4 & Eric S. Lander 1,5 1 Whitehead Institute and MIT Center for Genome Research, Nine Cambridge Center, Cambridge, Massachusetts 02142, USA. 2 Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA. 3 Divisions of Genetics and Endocrinology, Children’s Hospital, Boston, Massachusetts, USA. 4 Department of Molecular Biology and Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts, USA. 5 Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. Correspondence should be addressed to E.S.L. (e-mail: [email protected]). Functional polymorphism in genes can be classified as coding variation, altering the amino-acid sequence of the encoded protein, or regulatory variation, affecting the level or pattern of expression of the gene. Coding variation can be recognized directly from DNA sequence, and consequently its frequency and characteristics have been extensively described. By con- trast, virtually nothing is known about the extent to which gene regulation varies in populations. Yet it is likely that regu- latory variants are important in modulating gene function: alterations in gene regulation have been proposed to influence disease susceptibility and to have been the primary substrate for the evolution of species 1 . Here, we report a systematic study to assess the extent of cis-acting regulatory variation in 69 genes across four inbred mouse strains. We find that at least four of these genes show allelic differences in expression level of 1.5-fold or greater, and that some of these differences are tissue specific. The results show that the impact of regulatory variants can be detected at a significant frequency in a genomic survey and suggest that such variation may have important consequences for organismal phenotype and evolution. The results indicate that larger-scale surveys in both mouse and human could identify a substantial number of genes with com- mon regulatory variation. Naturally occurring polymorphism in gene sequence underlies the inheritance of phenotypic variation. The spectrum of amino- acid polymorphism has been well defined in human populations 2–5 . The probability is about 33% that two randomly chosen copies of a human gene encode proteins that differ at one or more amino-acid sites, although it is not known what propor- tion of these coding variants has functional consequences. Characterizing the extent of cis-acting regulatory variation pre- sents a much greater challenge, because it is not usually possible to Published online 15 October 2002; doi:10.1038/ng992 B6 A F 1 brain DNA B6 A F 1 brain mRNA brain mRNA/ no reverse transcriptase B6 (RoxU) A (JoeA) B6 (RoxU) A (JoeA) hybrid mouse samples mixed parental samples SBE template Dm15 in B6 A F 1 brain DNA B6:A 25:75 DNA B6:A 75:25 DNA B6:A 50:50 SBE template hybrid mouse samples mixed parental samples SBE template Pla2g2a in CAST B6 F 1 small intestine SBE template CAST (TamraC) B6 (RoxU) CAST (TamraC) B6 (RoxU) CAST B6 F1 small intestine DNA CAST B6 F 1 small intestine mRNA small intestine mRNA/ no reverse transcriptase DNA CAST:B6 75:25 DNA CAST:B6 25:75 DNA CAST:B6 50:50 Fig. 1 Detection of allele-specific transcript levels in F 1 hybrid mice. a, We did SBE genotyping for an A/T SNP in Dm15 using mixed genomic DNAs from A and B6 mice. Peak heights correlated well with the input mixtures of DNA, thereby providing a titration curve for assessing relative transcript levels for the two alleles in DNA and brain mRNA (reverse transcribed into cDNA) from A × B6 F 1 hybrid mice. A ‘no reverse transcriptase’ control is shown. b, Analogous experiment using a SNP in Pla2g2a, which is normally expressed in small intestine. The gene carries a coding mutation resulting in undetectable transcript levels in B6 but not in CAST mice. Note that for Pla2g2a assays, a second TamraC peak is observed, reflecting some read-through of primer extension reactions due to presence of residual dNTPs. a b © 2002 Nature Publishing Group http://www.nature.com/naturegenetics

Upload: eric-s

Post on 21-Jul-2016

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Detection of regulatory variation in mouse genes

letter

432 nature genetics • volume 32 • november 2002

Detection of regulatory variation in mouse genes

Christopher R. Cowles1, Joel N. Hirschhorn1–3, David Altshuler1,2,4 & Eric S. Lander1,5

1Whitehead Institute and MIT Center for Genome Research, Nine Cambridge Center, Cambridge, Massachusetts 02142, USA. 2Department of Genetics,Harvard Medical School, Boston, Massachusetts, USA. 3Divisions of Genetics and Endocrinology, Children’s Hospital, Boston, Massachusetts, USA.4Department of Molecular Biology and Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts, USA. 5Department of Biology, MassachusettsInstitute of Technology, Cambridge, Massachusetts, USA. Correspondence should be addressed to E.S.L. (e-mail: [email protected]).

Functional polymorphism in genes can be classified as codingvariation, altering the amino-acid sequence of the encodedprotein, or regulatory variation, affecting the level or patternof expression of the gene. Coding variation can be recognizeddirectly from DNA sequence, and consequently its frequencyand characteristics have been extensively described. By con-trast, virtually nothing is known about the extent to whichgene regulation varies in populations. Yet it is likely that regu-latory variants are important in modulating gene function:alterations in gene regulation have been proposed to influencedisease susceptibility and to have been the primary substratefor the evolution of species1. Here, we report a systematicstudy to assess the extent of cis-acting regulatory variation in69 genes across four inbred mouse strains. We find that at leastfour of these genes show allelic differences in expression levelof 1.5-fold or greater, and that some of these differences are

tissue specific. The results show that the impact of regulatoryvariants can be detected at a significant frequency in a genomicsurvey and suggest that such variation may have importantconsequences for organismal phenotype and evolution. Theresults indicate that larger-scale surveys in both mouse andhuman could identify a substantial number of genes with com-mon regulatory variation.Naturally occurring polymorphism in gene sequence underliesthe inheritance of phenotypic variation. The spectrum of amino-acid polymorphism has been well defined in humanpopulations2–5. The probability is about 33% that two randomlychosen copies of a human gene encode proteins that differ at oneor more amino-acid sites, although it is not known what propor-tion of these coding variants has functional consequences.

Characterizing the extent of cis-acting regulatory variation pre-sents a much greater challenge, because it is not usually possible to

Published online 15 October 2002; doi:10.1038/ng992

B6 ✕ A F1

brain DNA

B6 ✕ A F1

brain mRNA

brain mRNA/no reverse

transcriptase

B6(RoxU)

A(JoeA)

B6(RoxU)

A(JoeA)

hybrid mousesamples

mixed parentalsamples SBE

template

Dm15 in B6 ✕ A F1 brain

DNAB6:A25:75

DNAB6:A75:25

DNAB6:A50:50

SBE template

hybrid mousesamples

mixed parentalsamples SBE

template

Pla2g2a in CAST ✕ B6 F1 small intestine

SBE template

CAST(TamraC)

B6(RoxU)

CAST(TamraC)

B6(RoxU)

CAST ✕ B6 F1

small intestine DNA

CAST ✕ B6 F1

small intestine mRNA

small intestine mRNA/

no reversetranscriptase

DNACAST:B6

75:25

DNACAST:B6

25:75

DNACAST:B6

50:50

Fig. 1 Detection of allele-specific transcript levels in F1 hybrid mice. a, We did SBE genotyping for an A/T SNP in Dm15 using mixed genomic DNAs from A and B6 mice.Peak heights correlated well with the input mixtures of DNA, thereby providing a titration curve for assessing relative transcript levels for the two alleles in DNA andbrain mRNA (reverse transcribed into cDNA) from A × B6 F1 hybrid mice. A ‘no reverse transcriptase’ control is shown. b, Analogous experiment using a SNP in Pla2g2a,which is normally expressed in small intestine. The gene carries a coding mutation resulting in undetectable transcript levels in B6 but not in CAST mice. Note that forPla2g2a assays, a second TamraC peak is observed, reflecting some read-through of primer extension reactions due to presence of residual dNTPs.

a b

©20

02 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 2: Detection of regulatory variation in mouse genes

letter

nature genetics • volume 32 • november 2002 433

Nid1, nidogen + C C C A++L17324 √ √√

Parental alleles:

Gene

Hybrid mice surveyed:

liver

Expressed in:

Accession # B6

A DB

A

CA

ST

bra

in

sple

en

Maternal strain is listed first for F1 hybrid mice. Underlined genes showed evidence of allele-specific transcript level bias for the initial marker SNP tested.

B6

×A

B6

×D

BA

B6

×C

AST

Tap1, Transporter 1, ATP-binding cassette, sub-family B (MDR/TAP) + T C T C–+M55637 √ √Prkacb, protein kinase, cAMP-dependent, catalytic, β + T C C C++/–X61434 √ √Camk4, calcium/calmodulin-dependent protein kinase IV – T G G G+–X58995 √ √Hmgcr, HMG-CoA reductase + G A A A++M62766 √ √Il9r, interleukin 9 receptor – G T T T–+M84746 √ √Oprl, opioid receptor-like – C T T+–U09421 √ √Kcnj4, inwardly-rectifying potassium channel – A G G G+–U11075 √ √Cln3, ceroid lipofuscinosis, neuronal 3 + T C T C++U47106 √ √Ugt8, UDP-glucuronosyltransferase 8 – C T C T+–X92122 √ √Ptger1, prostaglandin E receptor 1 – T C C C++/–Z49987 √ √

Usf2, upstream transcription factor 2 + A A G G++U01663 √ √Gabrb2, GABA-A receptor, subunit β2 – T T C C+–U14419 √ √Vdac2, voltage-dependent anion channel 2 + G G A A++U30838 √ √Il1a, interleukin 1α + A A T T–+/–X01450 √ √Hsd11b1, hydroxysteroid 11-β dehydrogenase 1 + T T C C++/–X83202 √ √Nr2f6 nuclear receptor subfamily 2, group F, member 6 + T T C C+–X76654 √ √Fgl2, fibrinogen-like protein 2 – A A G G–+M16238 √ √Fabp2, fatty acid binding protein 2, intestinal + T T C C––M65034 √ √

DB

A ×

C

AST

Ptprk, protein tyrosine phosphatase, receptor type K – G G A G+–L10106 √ √Pcsk2, proprotein convertase subtilisin/kexin type – C C T C+–M55669 √ √Pkia, protein kinase inhibitor, α – C C T C+–M63554 √ √Adcy7, adenylate cyclase 7 + C C T C++U12919 √ √Zfp101, zinc finger protein 101 + C C T C+/–+/–U07861 √ √

Gpr83, G protein-coupled receptor 83 – C C C A+/––M80481 √ √

Smstr3, somatostatin receptor 3 – G G G T+/––M91000 √ √

A ×

C

AST

Slc1a2, solute carrier family 1, member 2 + C T C+–D43796 √ √Il8rb, interleukin 8 receptor, β – T C T T–+L23637 √ √Thbs1, thrombospondin 1 – A G G A–+M87276 √ √Gcet, germinal center expressed transcript – A T T A–+U13263 √ √Rbp1, retinol binding protein 1, cellular + G A A G++X60367 √ √Dm15, dystrophia myotonica kinase, B15 + T A A T++Z21503 √ √Pdgfb, platelet derived growth factor, B polypeptide – T C C T++M64849 √ √Fcgr2b, Fc receptor, IgG, low affinity IIb + C A C C++X04648 √ √Ect2, ect2 oncogene + T C T T++L11316 √ √Eif3, eukaryotic translation initiation factor 3 + T C T T++X84651 √ √

Fabp7, fatty acid binding protein 7, brain – T T T C+–U04827 √ √√Chuk, conserved helix-loop-helix ubiquitous kinase + C C C T++U12473 √ √√Slc12a2, solute carrier family 12, member 2 – G G G A++U13174 √ √√Ccl9, chemokine (C-C motif) ligand 9 + A A A G–+U15209 √ √√H1f0, H1 histone family, member 0 + T T T A++U18295 √ √√Adcy9, adenylate cyclase 9 + T T T C++U30602 √ √√Vegfb, vascular endothelial growth factor B +/– A A A C++U43836 √ √√Cfi, complement component factor i + C C C T+/–+U47810 √ √√Pep4, peptidase 4 + C C C T++U51014 √ √√Cntn1, contactin 1 – A A A G+–X14943 √ √√Gja4, gap junction membrane channel protein α4 + G G G C++/–X57971 √ √√Fgfr4, fibroblast growth factor receptor 4 + C C C T–+/–X59927 √ √√Foxa3, forkhead box A3 + G G G A––X74938 √ √√Serpinf2, serine (or cysteine) proteinase inhibitor F2 + A A A G––Z36774 √ √√

Stat5a, signal transducer and activator of transcription 5A + T T T C+/–+/–U21103 √ √√Gad1, glutamic acid decarboxylase 1 – A A A G+–Z49976 √ √√Egf, epidermal growth factor – A A A G+–J00380 √ √√Fancc, Fanconi anemia, complementation group C + C C C T++L08266 √ √√Il6st, interleukin 6 signal transducer + C C C T++M83336 √ √√

Plg, plasminogen + A A A G+/–+/–J04766 √ √√Tcf1, transcription factor 1 + C C C G––M57966 √ √√Serpind1, serine (or cysteine) proteinase inhibitor D1 + T T T A+/––U07425 √ √√Man2a1, mannosidase 2, α1 + C C C T++X61172 √ √√

Gja1, gap junction membrane channel protein α1 – G G G A++X61576 √ √√Ptpn5, protein tyrosine phosphatase, non-receptor type – A A A G+–U28217 √ √√Laptm5, lysosomal-associated protein transmembrane 5 + C C C T++U29539 √ √√Tbxas1, thromboxane A synthase 1, platelet + G G G A+/–+L18868 √ √√Hsp105, heat shock protein, 105 kD + A A A C++L40406 √ √√Psen1, presenilin 1 +/– G G G A++/–L42177 √ √√Ms4a2, membrane-spanning 4-domains + A A A G–+M62541 √ √√

Ccnf, cyclin F + T C C T++Z47766 √ √

Uros, uroporphyrinogen III synthase + A A A G++/–U18867 √ √√Cals1, carbonic anhydrase-like sequence 1 + C C C T++X61397 √ √√

Table 1 • Summary of mouse transcripts surveyed

discern, even from complete analysis of sequence variationthroughout a gene locus, whether the gene harbors polymor-phisms that affect its regulation. The difficulty is due to the limitedability both to recognize the regulatory regions of most genes(which can be tens or even hundreds of thousands of bases fromthe transcription unit) and, more importantly, to predict whichnucleotide changes in regulatory elements might affect expression.Moreover, one cannot experimentally screen for regulatory varia-tion simply by comparing transcript levels for a gene among differ-ent individuals, because such differences could well be due to

trans-acting factors or environmental differences rather than cisregulation. For these reasons, little is known about the prevalenceof regulatory variation.

Definitive identification of the subset of genes that harbor reg-ulatory variants requires studying two alleles of a gene underidentical circumstances and comparing the expression of thetranscript associated with each. This can be done by comparingthe expression of alleles from two mouse strains (A and B) in anF1 hybrid mouse (A × B). This technique controls for trans effectsand environmental influences. Such experiments require the

©20

02 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 3: Detection of regulatory variation in mouse genes

letter

434 nature genetics • volume 32 • november 2002

ability to distinguish between the transcripts derived from eachof the two parental alleles. For this purpose, one can use a‘marker’ single-nucleotide polymorphism (SNP) in the tran-script itself. Such a marker SNP will typically not be the regula-tory variant, which will often lie in a non-transcribed region.This experimental design has been used to study imprinting,which is responsible for silencing of the maternal or paternalallele of a gene6,7.

We recently reported a large number of sequence polymor-phisms among mouse strains8, including a collection of markerSNPs in transcripts of known genes. Using these marker SNPs,we sought to design an assay with sufficient precision to detectdifferences in expression levels due to allelic effects rather thanthe large effects typically caused by imprinting. We focused onthe mouse strains C57BL/6J, A/J, DBA/2J and CAST/Ei (denotedbelow as B6, A, DBA and CAST, respectively).

We developed a quantitative genotyping assay for comparingthe levels of two alleles in a sample of either genomic DNA ormRNA. The assay involves amplification of the region surround-ing the marker SNP (by PCR for genomic DNA or RT–PCR formRNA), single-base extension (SBE) of a primer adjacent to thevariant base in the presence of fluorescently labelednucleotides9–13, and detection on a DNA sequence detector. Theratio of the two alleles is inferred by comparison with knownmixtures used as reference standards (Methods). To assess theprecision of the method, we used nine marker SNPs in sevengenes (Slc1a2, Il8rb, Tap1, Prkacb, Ly78, Oprm and Hand1) withalleles that differed between the B6 and A strains, and weattempted to estimate the proportion of each allele in various testmixtures of genomic DNA from the two strains. In each case, theestimated value was within five percentage points of the truevalue. We concluded that genes for which the two alleles are

expressed at equal levels (50:50) are unlikely to show a ratio of60:40 (1.5) or greater in this assay. We thus chose an average ratioof 1.5 as a threshold for detection.

A representative assay of allele-specific transcript levels for thegene Dm15 in DNA and cDNA samples from B6 × A F1 hybridmice showed no allele-specific differences in transcript levels(Fig. 1a). As a positive control, we applied the assay to a genewith a well-documented allelic difference in transcript levels: thegene Pla2g2a (encoding secretory phospholipase A2, group IIA)has undetectable transcript levels in B6 mice owing to a mutationcreating a premature stop codon, but has normal transcript levelsin CAST mice14. This allelic difference was readily detected by theassay (Fig. 1b).

We next examined the allele-specific transcript levels of 69mouse genes. The genes were selected arbitrarily subject to thefollowing criteria: (i) we had found a suitable marker SNPbetween B6 and at least one of the A, DBA and CAST strains inthe transcribed region in our recent survey8; (ii) we were able todevelop a robust genotyping assay for the variant that yielded theexpected homozygous genotypes in the parental strains (thuseliminating artifacts due to pseudogenes); and (iii) the geneshowed detectable expression by our assay in at least one ofspleen, liver and brain tissues in B6 × A F1 hybrid mice. Each genewas assayed by using a single marker SNP to detect allelic differ-ences in transcript levels in F1 hybrid mice of the following com-binations: B6 × A, B6 × DBA, B6 × CAST, A × CAST and DBA ×CAST (Table 1). Transcript levels were studied in spleen, liverand brain of two independent adult females.

We observed seemingly skewed expression for seven genes, usinga minimum threshold ratio for allelic difference of 1.5, observed induplicate animals (Table 1). Four of these genes (Il9r, Ccnf, Uros,Hmgcr) consistently showed ratios exceeding 1.5 when the initial

Table 2 • Follow-up analysis of genes using additional variants as markers

Candidate gene (accession number) SNP Parental alleleTissue: Results for initial marker SNP position Exon B6 A DBA CAST Ratio Confirmed?

Il9r-1 1562 9 (3′ UTR) G T T T >3×Il9r (M84746) Il9r-2 55 2 G G G A >3×

spleen: B6 > A, CAST Il9r-3 207 3 C T T C >3× yesliver: not expressed Il9r-4 934 8 C T T C >3×brain: not expressed Il9r-5 1582 9 (3′ UTR) G G G A >3×

Il9r-6 2435 9 (3′ UTR) C G G G >3×Ccnf (Z47766) Ccnf-1 2597 17 (3′ UTR) T C C T >2.5×

spleen: no bias detected Ccnf-2 1561 14 G A A G >3× yesliver: B6, CAST > A Ccnf-3 1654 15 A G G A >3×brain: no bias detected Ccnf-4 2354 17 (3′ UTR) T C C T 2–3×

Uros (U18867) Uros-1 1209 10 (3′ UTR) A A A G 2–4×spleen: B6, A, DBA > CAST Uros-2 1010 10 (3′ UTR) C C C A 1.5–4× yesliver: B6, A, DBA > CAST Uros-3 1512 10 (3′ UTR) A A A G 2–4×brain: B6, A, DBA > CAST

Hmgcr (M62766)spleen: no bias detected Hmgcr-1 894 20 (3′ UTR) G A A A 1.5–2×liver: CAST > B6 Hmgcr-2 763 20 (3′ UTR) A A A T 2× yesbrain: no bias detected

Fcgr2b (X04648)spleen: B6 > A Fcgr2b-1 1042 6 (3′ UTR) C A C C 1.5×liver: no bias detected Fcgr2b-2 629 3 T A T A no bias nobrain: no bias detected

Cals1 (X61397) Cals1-1 1560 3’ UTR G G G A 1.3–1.5×spleen: no bias detected Cals1-2 411 G G G Aliver: B6 > CAST Cals1-3 438 C C C T nobrain: B6 > CAST Cals1-4 534 T T T C

Cals1-5 1010 3’ UTR A A A C

Cln3 (U47106)spleen: A > B6 Cln3-1 1531 15 (3′ UTR) T C T C 1.3–1.5×liver: no bias detected Cln3-2 1308 14 C T C C no bias nobrain: A, CAST > B6

noconsistent

bias

©20

02 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 4: Detection of regulatory variation in mouse genes

letter

nature genetics • volume 32 • november 2002 435

marker SNP for each gene was repeatedly assayed in tissue from F1hybrid mice. Three genes (Fcgr2b, Cals1, Cln3) reproduciblyshowed ratios estimated at 1.3–1.5. Although the average ratio wasless than our threshold of 1.5, we included these genes in the analy-sis because they showed a ratio of 1.5 in at least one replicate. Asnoted below, they turned out to be false positives. We verified thatthe skewed ratios were not due to imprinting by confirming thatthe bias depended on the allele itself rather than on the sex of theparent from which the allele was inherited (data not shown).

We then sought to confirm these results by identifying andtesting additional marker SNPs within the transcripts. Wesequenced the seven transcripts in the four strains and foundmultiple additional marker SNPs in each (ranging from five forHmgcr to 42 for Il9r). No gross alterations, such as stop codons orsplice-site mutations, were detected. For all four genes with ratiosexceeding 1.5, skewing of the ratio was confirmed by testing

additional marker variants in the tissue or tissues initially foundto show a difference (Table 2).

Notably, two of the genes showed regulatory variation thatacted in a tissue-specific fashion. Both Ccnf and Hmgcr showeddetectable expression in all three tissues examined for all SNPstested, but allelic differences in expression levels were observedonly in the liver (Fig. 2a and data not shown). In these two cases,the regulatory variant may lie in a tissue-specific control elementsuch as an enhancer. In contrast, Uros showed allelic expressiondifferences in all three tissues studied, although the bias seemedmore pronounced in liver and brain tissues than in spleen tissues(Fig. 2b and data not shown). Il9r was only expressed in one ofthe three tissues (Fig. 2c).

The three genes with ratios in the range 1.3–1.5 but an averageratio below 1.5 (Fcgr2b, Cals1 and Cln3) did not exhibit a consistentbias when tested at additional marker SNPs. These are probably

F1

F1

/

F1

F1

/

hybrid mousesamples

mixed parentalsamples SBE

template

Ccnf in B6 × A F1 liver, spleen, brain

SBE template

SBE template

SBE template

SBE template

SBE template

hybrid mousesamples

mixed parentalsamples SBE

templateSBE

template

hybrid mousesamples

hybrid mousesamples

hybrid mousesamples

mixed parentalsamples

B6 × A F1

liver DNA

B6 × A F1

liver mRNA

liver mRNA/no reverse

transcriptase

DNAB6:A25:75

DNAB6:A75:25

DNAB6:A50:50

B6(RoxU)

A(TamraC)

B6(RoxU)

A(TamraC)

B6(RoxU)

A(TamraC)

B6(RoxU)

A(TamraC)

B6 × A F1

spleen DNA

B6 × A F1

spleen mRNA

spleen mRNA/no reverse

transcriptase

B6 × A F1

brain DNA

B6 × A F1

brain mRNA

brain mRNA/no reverse

transcriptase

Uros in B6 × CAST F1 liver Il9r in B6 × A F1 spleen

B6 × CAST F1

liver DNA

B6 × CAST F1

liver mRNA

liver mRNA/no reverse

transcriptase

DNAB6:CAST

25:75

DNAB6:CAST

75:25

DNAB6:CAST

50:50

CAST(FamG)

B6(JoeA)

CAST(FamG)

B6(JoeA)

A(RoxU)

B6(FamG)

A(RoxU)

B6(FamG)

DNAB6:A25:75

DNAB6:A75:25

DNAB6:A50:50

B6 × A F1

spleen DNA

B6 × A F1

spleen mRNA

spleen mRNA/no reverse

transcriptase

Fig. 2 Candidate genes for regulatory variation. a, SBE genotyping of a C/T SNP in Ccnf in mixed genomic DNAs from A and B6 strains and in spleen, liver andbrain DNA and mRNA from B6 × A F1 hybrid mice. b,c, Analogous data for a G/A polymorphism in Uros, analyzed in liver (b), and a G/T polymorphism in Il9r,examined in spleen (c).

a

b c

©20

02 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 5: Detection of regulatory variation in mouse genes

letter

436 nature genetics • volume 32 • november 2002

false positives, indicating that the threshold of an average ratio of1.5 (that is, 60:40) in replicate samples is appropriate.

Our results suggest that a considerable number of the roughly35,000 mouse genes15 probably contain functional regulatoryvariants affecting expression levels by at least 1.5-fold among thefour mouse strains studied. We found such regulatory variants infour of 69 genes, corresponding to a frequency of approximately6%, although the error bars on this estimate are large. Our screenprobably underestimates the true proportion of genes that con-tain regulatory variants for three reasons. First, the availableSNPs did not allow all pairwise comparisons to be tested amongthe four strains. On average, about 41% of the pairwise combina-tions were directly assayed with available SNPs. Second, we haveonly examined three tissues (spleen, liver and brain) and onedevelopmental stage (adult). A survey of more tissues, develop-mental stages and environmental conditions would probablyreveal more variation. Third, the sensitivity of the assay does notallow reliable detection of differences of less than 1.5-fold. Rela-tive increases of less than 50% in expression levels of some allelesmay be physiologically and evolutionarily important. On theother hand, we should note that variation in mRNA levels doesnot necessarily imply differences in protein levels, owing to thepossibility of feedback mechanisms acting at the level of transla-tion or protein degradation.

Notably, our assay definitively identifies the presence of regu-latory variation (that is, differences in transcript levels due tocis-acting sequence differences) without directly identifying orrequiring knowledge of specific regulatory variants (that is, spe-cific cis-regulatory polymorphisms). This is a strength of theapproach, because regulatory variants cannot typically be detectedby sequence analysis but rather require extensive experimentalstudies. Indeed, few specific regulatory variants have been definedand characterized in detail (for example, UGT1A1 in Gilbert syn-drome16,17, mutations in the β-globin locus control region in tha-lassemias18–20, and CFTR splicing variation in cystic fibrosis21).

To examine the issues involved in identifying specific regula-tory variants, we resequenced the region (roughly 1 kb) imme-diately 5′ of each of the four genes that showed clear regulatorypolymorphism. As expected given the considerable polymor-phism rates between the strains8, numerous polymorphismswere identified (Table 3). No particular polymorphism stoodout as a strong candidate for being responsible for the regula-tory variation; for example, none disrupted a conserved fea-ture in a known regulatory element. Moreover, there is noguarantee that the actual regulatory variant lies in the immedi-ate vicinity of the promoter, rather than many thousands ofbases away. For example, key regulatory sequences for the β-globin locus are located in a locus control region some 50 kbaway18–20. Definitive identification of a specific nucleotideposition as a regulatory variant will require experiments suchas site-directed mutagenesis, in which the specific polymor-phism from one allele is introduced into the sequence of theother allele. Our study suggests that it will be more productiveat this time to survey genes for regulatory variation than tosearch exhaustively for specific regulatory variants.

The finding that regulatory variants occur at a detectable ratein inbred mouse strains highlights their potential both to con-tribute to organismal phenotype and to serve as substrates forevolutionary selection1. Systematic surveys of both regulatoryand coding variants, together with tests to identify those withfunctional consequences, will be needed to define the relativecontribution of each type of gene polymorphism. Notably, ourapproach to identify cis-regulatory variation can be directlyextended to human cells. Such studies should shed light on thenature of mammalian evolution, as well as identify gene variantsto be examined in association studies of human disease.

MethodsMouse breeding and tissue preparation. We obtained C57BL/6J (B6),DBA/2J (DBA), A/J (A), CAST/Ei (CAST), B6 × A F1 and B6 × DBA F1mice from Jackson Labs, and carried out the following matings in our lab-oratory and in the laboratory of T. Jacks (Massachusetts Institute of Tech-nology) to generate F1 hybrid mice: B6 female × CAST male, CAST female× B6 male, A female × CAST male and DBA female × CAST male. We killedfemale F1 mice at four or more weeks of age and harvested spleen, liver andbrain tissues, which were promptly frozen in liquid nitrogen. We preparedDNA and total RNA by grinding frozen tissues with mortar and pestle andthen used Trizol preparation (Gibco-BRL) according to manufacturer’sinstructions. Poly(A)+ mRNA was purified from total RNA using Oligotex(Qiagen) according to manufacturer’s instructions. Aliquots of 200 ng ofpoly(A)+ mRNA samples were reverse transcribed by treatment withSuperscript II RT (1000 U; Gibco-BRL) in the presence of 7.5 ng µl–1 ran-dom hexamer oligonucleotides to yield 100 µl first-strand cDNA. Treat-ment proceeded for 10 min at 25 °C, 50 min at 42 °C, 15 min at 70 °C andwas followed by addition of RNase H (2.5 µl, 5 U) with samples incubatedfor 30 min at 37 °C. We prepared ‘no reverse transcriptase’ control samplesin parallel, and replaced Superscript II RT with DEPC-treated water duringfirst-strand cDNA–synthesis steps. Resulting samples were diluted fivefold,and DNA samples were diluted to 20 ng µl–1 to obtain templates for PCR.Mouse manipulations were done with approval of Massachusetts Instituteof Technology’s Committee on Animal Care, current protocol 0602-031-15.

Screen for allele-specific transcript level biases in hybrid mice. We select-ed marker SNPs from our previously reported collection of mouse SNPs8.We amplified regions containing a SNP using primers designed by Primer3.0 (Whitehead Institute website software; melting temperature of55–64 °C, 20–80% CG content; 20–35 nt primer length). For each reac-tion, we combined templates (100 ng for DNA, 5 µl for cDNA or no-RTcontrol) with AmpliTaq Gold (2.5 U; Perkin Elmer), manufacturer’s buffer,MgCl2 (5 mM), dNTPs (0.5 mM), locus-specific primers (1 µM each,tailed with T7 and M13-21 sequences, respectively; Research Genetics andGibco-BRL) and biotinylated-T7 oligonucleotide (1 µM, Gibco-BRL) andM13-21 oligonucleotide (1 µM, Gibco-BRL) to allow for synthesis of abiotinylated product. Samples were denatured for 10 min at 95 °C, fol-lowed by 32 cycles of 95 °C for 30 s, 53 °C for 2 min and 72 °C for 30 s witha final extension of 72 °C for 5 min. Biotinylated PCR products were thenpurified from unincorporated dNTPs using streptavidin-coated Dyn-abeads (Dynal) as described previously8. Beads with attached productswere subjected to SBE reactions with Thermosequenase (1.3 U; Amer-sham), SBE primers (0.25 µM; designed to have melting temperature of50–65 °C, 20–30 nt length, to terminate on the base 5′ to the SNP and notto contain any neighboring SNPs), Tris (50 mM, pH 9.0) and MgCl2(2 mM), with FAM-ddGTP (20 nM), JOE-ddATP (20 nM), TAMRA-ddCTP (20 nM) and ROX-ddUTP (200 nM) added in accordance with the

Table 3 • Polymorphisms in promoter regions of genes showing regulatory variation

Gene Region sequenced Relevant comparison Polymorphisms

Il9r 628 bp B6 × A, CAST 7 sites (all SNPs). A polymorphic complex repeat occurs628 bp 5′ of predicted transcription start site.

Ccnf 1.9 kb B6, CAST × A 14 sites (10 SNPs, 1 dinucleotide site, 3 insertion/deletions)

Uros 916 bp B6, A, DBA × CAST 16 sites (15 SNPs, 1 insertion/deletion)

Hmgcr 1.64 kb B6 × CAST 11 sites (6 SNPs, 1 microsat. length, 4 insertion/deletions)

©20

02 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics

Page 6: Detection of regulatory variation in mouse genes

letter

nature genetics • volume 32 • november 2002 437

nature of the assayed SNP. Fifteen cycles of 96 °C for 30 s, 50 °C for 15 s,60 °C for 1 min were carried out. We removed excess ddNTPs by centrifu-gation in 96-well gel filtration blocks (Edge Biosystems), and then driedthe eluates, resuspended them in 4 µl formamide and size-separated themby electrophoresis through a 10% denaturing polyacrylamide gel on anABI377 sequencer run at 200 W for 2.5–3.5 h. As many as four loads ofsamples, staggered at 10–15 min intervals, were done on each gel. Wedetermined peaks of dye intensities corresponding to extension of SBEprimers by inspecting output from the ABI 377 after background subtrac-tion and color separation. We tested an initial set of 128 SNPs on mixedgenomic DNA samples (parental mouse genomic DNAs were purchasedfrom Jackson Labs). We then selected an initial subset of 70 SNPs that metthe following conditions: the assay was robust, showed a single dye peak ingenomic DNA from each parental strain, and yielded peak heights thatconsistently scaled in proportion to input mixing ratio and exhibiteddetectable signal when tested on spleen, liver or brain mRNA from A × B6F1 mice. We later determined that assays for one gene that met these crite-ria, Cox7a2l (MGI accession X80899), were influenced by the presence of atleast two paralogs, and Cox7a2l was therefore eliminated from further con-sideration. For assays performed on the 69 remaining SNPs, we deter-mined the magnitudes of peak heights for all samples. We normalizedthese peak height values to those of standard curve templates containingequal levels of parental genomic DNAs. We then compared the normalizedpeak heights for test samples to those for mixed genomic DNA samples ofknown ratios to estimate the percentage of each allele in test samples. Wethen compared the allelic percentage values for mRNA and DNA in eachmouse tissue sample to calculate the ratio in mRNA relative to DNA.

Sequence analysis of parental strain cDNAs. We prepared spleen, liverand brain cDNA populations from B6, A/J, DBA/2J and CAST/Ei strains ofmice as described above, and did sequencing of DNA and cDNA samples todetect polymorphisms as described previously for DNA samples2. Detailedpolymorphism information is available in Web Figs A–D online.

Assessment of transcription factor binding sites in 5′ sequence of candi-date genes. We used the MatInspector22 and AliBaba223 programs tosearch for promoter elements in candidate gene sequences.

URL. The Primer 3.0 software is available at http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi.

Note: Supplementary information is available on the NatureGenetics website.

AcknowledgmentsWe thank J. Platko, A. Rachupka, R. Prill and D. Richter for cDNAsequencing; E. Winchester and K. Lindblad-Toh for assistance inidentification of appropriate SNPs to assay; Y.M. Lim and P. Sklar for helpwith initial SBE assays; M. Daly for gel analysis software and discussions;J. Rioux and E.J. Kulbokas for aiding genomic DNA sequencing; andD. Reich, G. Acton, K. Hong, A. Gimelbrant and A. Chess for discussions.

This work was supported in part by a fellowship of the Damon RunyonCancer Research Foundation (to C.R.C.) and by grants from the US NationalInstitutes of Health (to E.S.L.). J.N.H. is a recipient of a Howard HughesMedical Institute Postdoctoral Fellowship for Physicians.

Competing interests statementThe authors declare that they have no competing financial interests.

Received 14 June; accepted 31 July 2002.

1. King, M.C. & Wilson, A.C. Evolution at two levels in humans and chimpanzees.Science 188, 107–116 (1975).

2. Cargill, M. et al. Characterization of single-nucleotide polymorphisms in codingregions of human genes. Nat. Genet. 22, 231–238 (1999).

3. Cambien, F. et al. Sequence diversity in 36 candidate genes for cardiovasculardisorders. Am. J. Hum. Genet. 65, 183–191 (1999).

4. Li, W.H. & Sadler, L.A. Low nucleotide diversity in man. Genetics 129, 513–523(1991).

5. Halushka, M.K. et al. Patterns of single-nucleotide polymorphisms in candidategenes for blood-pressure homeostasis. Nat. Genet. 22, 239–247 (1999).

6. Singer-Sam, J. Quantitation of specific transcripts by RT–PCR SNuPE assay. PCRMethods Appl. 3, S48–S50 (1994).

7. Szabo, P.E. & Mann, J.R. Allele-specific expression and total expression levels ofimprinted genes during early mouse development: implications for imprintingmechanisms. Genes Dev. 9, 3097–3108 (1995).

8. Lindblad-Toh, K. et al. Large-scale discovery and genotyping of single-nucleotidepolymorphisms in the mouse. Nat. Genet. 24, 381–386 (2000).

9. Syvanen, A.C., Aalto-Setala, K., Harju, L., Kontula, K. & Soderlund, H. A primer-guided nucleotide incorporation assay in the genotyping of apolipoprotein E.Genomics 8, 684–692 (1990).

10. Kobayashi, M. et al. Fluorescence-based DNA minisequence analysis for detectionof known single-base changes in genomic DNA. Mol. Cell. Probes 9, 175–182(1995).

11. Pastinen, T., Kurg, A., Metspalu, A., Peltonen, L. & Syvanen, A.C. Minisequencing:a specific tool for DNA analysis and diagnostics on oligonucleotide arrays.Genome Res. 7, 606–614 (1997).

12. Chen, X., Zehnbauer, B., Gnirke, A. & Kwok, P.Y. Fluorescence energy transferdetection as a homogeneous DNA diagnostic method. Proc. Natl Acad. Sci. USA94, 10756–10761 (1997).

13. Landegren, U., Nilsson, M. & Kwok, P.Y. Reading bits of genetic information:methods for single-nucleotide polymorphism analysis. Genome Res. 8, 769–776(1998).

14. Kennedy, B.P. et al. A natural disruption of the secretory group II phospholipaseA2 gene in inbred mouse strains. J. Biol. Chem. 270, 22378–22385 (1995).

15. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature409, 860–921 (2001).

16. Bosma, P.J. et al. The genetic basis of the reduced expression of bilirubin UDP-glucuronosyltransferase 1 in Gilbert’s syndrome. N. Engl. J. Med. 333, 1171–1175(1995).

17. Monaghan, G., Ryan, M., Seddon, R., Hume, R. & Burchell, B. Genetic variation inbilirubin UPD-glucuronosyltransferase gene promoter and Gilbert’s syndrome.Lancet 347, 578–581 (1996).

18. Grosveld, F., van Assendelft, G.B., Greaves, D.R. & Kollias, G. Position-independent, high-level expression of the human β-globin gene in transgenicmice. Cell 51, 975–985 (1987).

19. Dillon, N., Trimborn, T., Strouboulis, J., Fraser, P. & Grosveld, F. The effect ofdistance on long-range chromatin interactions. Mol. Cell 1, 131–139 (1997).

20. Li, Q., Harju, S. & Peterson, K.R. Locus control regions: coming of age at a decadeplus. Trends Genet. 15, 403–408 (1999).

21. Rave-Harel, N. et al. The molecular basis of partial penetrance of splicingmutations in cystic fibrosis. Am. J. Hum. Genet. 60, 87–94 (1997).

22. Quandt, K., Frech, K., Karas, H., Wingender, E. & Werner, T. MatInd andMatInspector: new fast and versatile tools for detection of consensus matches innucleotide sequence data. Nucleic Acids Res. 23, 4878–4884 (1995).

23. Grabe, N. AliBaba2: context specific identification of transcription factor bindingsites. In Silico Biol. 2, S1–S15 (2002).

©20

02 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.nat

ure

.co

m/n

atu

reg

enet

ics