association of candidate genes with flowering time and water … · 2007. 9. 20. · that lphd1 is...

13
Copyright Ó 2007 by the Genetics Society of America DOI: 10.1534/genetics.107.071522 Association of Candidate Genes With Flowering Time and Water-Soluble Carbohydrate Content in Lolium perenne (L.) Leif Skøt, 1 Jan Humphreys, Mervyn O. Humphreys, Danny Thorogood, Joe Gallagher, Ruth Sanderson, Ian P. Armstead and Ian D. Thomas Institute of Grassland and Environmental Research, Plant Genetics and Breeding Department, Aberystwyth, Ceredigion SY23 3EB, United Kingdom Manuscript received January 31, 2007 Accepted for publication June 27, 2007 ABSTRACT We describe a candidate gene approach for associating SNPs with variation in flowering time and water- soluble carbohydrate (WSC) content and other quality traits in the temperate forage grass species Lolium perenne. Three analysis methods were used, which took the significant population structure into account. First, a linear mixed model was used enabling a structured association analysis to be incorporated with the nine populations identified in the structure analysis as random variables. Second, a within-population analysis of variance was performed. Third, a tree-scanning method was used, in which haplotype trees were associated with phenotypes on the basis of inferred haplotypes. Analysis of variance within populations identified several associations between WSC, nitrogen (N), and dry matter digestibility with allelic variants within an alkaline invertase candidate gene LpcAI. These associations were only detected in material harvested in one of the two years. By contrast, consistent associations between the L. perenne homolog (LpHD1) of the rice photoperiod control gene HD1 and flowering time were identified. One SNP, in the immediate upstream region of the LpHD1 coding sequence (C-4443-A), was significant in the linear mixed model. Within-population analysis of variance and tree-scanning analysis confirmed and extended this result to the 2118 polymorphisms in some of the populations. The merits of the tree-scanning method are compared to the single SNP analysis. The potential usefulness of the 4443 SNP in marker-assisted selection is currently being evaluated in test crosses of genotypes from this work with turf-grass varieties. A SSOCIATION or linkage disequilibrium (LD) map- ping in crop plant species has received increasing attention in recent years owing to its potential for fine mapping of traits and the prospects for identifying functional markers (Nordborg and Tavare 2002; Nordborg et al. 2002; Flint-Garcia et al. 2003; Rafalski and Morgante 2004; Flint-Garcia et al. 2005; Gupta et al. 2005; Yu and Buckler 2006; Breseghello and Sorrells 2006a). By using popula- tions of unknown pedigree, the recombination events that have occurred over many generations are exploited for more refined mapping than is possible in conven- tional F 2 or backcross mapping families (Flint-Garcia et al. 2003). The method thus has the potential to provide useful markers for marker-assisted selection (MAS) in genetic improvement programs. It was first used as a candidate gene approach in plants by Thornsberry et al. (2001), who demonstrated associ- ation between allelic variants and flowering time in the Dwarf8 gene in maize. It has been followed by other analyses in maize (Wilson et al. 2004; Szalma et al. 2005; Yu et al. 2006), rice (Bao et al. 2006a, 2006b), Arabidopsis thaliana (Olsen et al. 2004; Aranzana et al. 2005), barley (Ivandic et al. 2002; Kraakman et al. 2006), and wheat (Breseghello and Sorrells 2006b). The method is dependent upon LD (the nonrandom occurrence of alleles at different loci) between marker and phenotype, and this is affected by recombination. The effective recombination rate in turn is influenced by the breeding system. In inbreeding species effective recombination is lower, whereas in self-incompatible species the opposite is the case. In species where LD has been studied, it has in general extended further in self- compatible species, than in those that are out-breeding (Flint-Garcia et al. 2003; Rafalski and Morgante 2004). The potential for higher resolution mapping would therefore be expected in the latter species. The important temperate forage and amenity grass Lolium perenne is an obligate out-breeding species (Cornish et al. 1979). One would thus expect LD to decay to insignificant levels over short distances. The only data on LD in L. perenne come from an AFLP marker analysis of populations, in which the resolution was limited by the 2–3-cM resolution of the F 2 mapping family onto which the markers were mapped (Skøt et al. 2005), and from preliminary data on the alkaline invertase Sequence data from this article have been deposited with the EMBL/ GenBank Data Libraries under accession nos. AM489608 and AM489692bb 1 Corresponding author: Plant Genetics and Breeding Department, Institute of Grassland and Environmental Research, Plas Gogerddan, Aberystwyth, Ceredigion SY23 3EB, United Kingdom. E-mail: [email protected] Genetics 177: 535–547 (September 2007)

Upload: others

Post on 13-Mar-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Association of Candidate Genes With Flowering Time and Water … · 2007. 9. 20. · that LpHD1 is involved in the photoperiodic control of thefloweringphenotype.Wehavethereforeusedthisto

Copyright � 2007 by the Genetics Society of AmericaDOI: 10.1534/genetics.107.071522

Association of Candidate Genes With Flowering Time and Water-SolubleCarbohydrate Content in Lolium perenne (L.)

Leif Skøt,1 Jan Humphreys, Mervyn O. Humphreys, Danny Thorogood, Joe Gallagher,Ruth Sanderson, Ian P. Armstead and Ian D. Thomas

Institute of Grassland and Environmental Research, Plant Genetics and BreedingDepartment, Aberystwyth, Ceredigion SY23 3EB, United Kingdom

Manuscript received January 31, 2007Accepted for publication June 27, 2007

ABSTRACT

We describe a candidate gene approach for associating SNPs with variation in flowering time and water-soluble carbohydrate (WSC) content and other quality traits in the temperate forage grass species Loliumperenne. Three analysis methods were used, which took the significant population structure into account.First, a linear mixed model was used enabling a structured association analysis to be incorporated with thenine populations identified in the structure analysis as random variables. Second, a within-populationanalysis of variance was performed. Third, a tree-scanning method was used, in which haplotype trees wereassociated with phenotypes on the basis of inferred haplotypes. Analysis of variance within populationsidentified several associations between WSC, nitrogen (N), and dry matter digestibility with allelic variantswithin an alkaline invertase candidate gene LpcAI. These associations were only detected in materialharvested in one of the two years. By contrast, consistent associations between the L. perenne homolog(LpHD1) of the rice photoperiod control gene HD1 and flowering time were identified. One SNP, in theimmediate upstream region of the LpHD1 coding sequence (C-4443-A), was significant in the linear mixedmodel. Within-population analysis of variance and tree-scanning analysis confirmed and extended thisresult to the 2118 polymorphisms in some of the populations. The merits of the tree-scanning method arecompared to the single SNP analysis. The potential usefulness of the 4443 SNP in marker-assisted selectionis currently being evaluated in test crosses of genotypes from this work with turf-grass varieties.

ASSOCIATION or linkage disequilibrium (LD) map-ping in crop plant species has received increasing

attention in recent years owing to its potential for finemapping of traits and the prospects for identifyingfunctional markers (Nordborg and Tavare 2002;Nordborg et al. 2002; Flint-Garcia et al. 2003;Rafalski and Morgante 2004; Flint-Garcia et al.2005; Gupta et al. 2005; Yu and Buckler 2006;Breseghello and Sorrells 2006a). By using popula-tions of unknown pedigree, the recombination eventsthat have occurred over many generations are exploitedfor more refined mapping than is possible in conven-tional F2 or backcross mapping families (Flint-Garcia

et al. 2003). The method thus has the potential toprovide useful markers for marker-assisted selection(MAS) in genetic improvement programs. It was firstused as a candidate gene approach in plants byThornsberry et al. (2001), who demonstrated associ-ation between allelic variants and flowering time in theDwarf8 gene in maize. It has been followed by other

analyses in maize (Wilson et al. 2004; Szalma et al.2005; Yu et al. 2006), rice (Bao et al. 2006a, 2006b),Arabidopsis thaliana (Olsen et al. 2004; Aranzana et al.2005), barley (Ivandic et al. 2002; Kraakman et al.2006), and wheat (Breseghello and Sorrells 2006b).The method is dependent upon LD (the nonrandomoccurrence of alleles at different loci) between markerand phenotype, and this is affected by recombination.The effective recombination rate in turn is influencedby the breeding system. In inbreeding species effectiverecombination is lower, whereas in self-incompatiblespecies the opposite is the case. In species where LD hasbeen studied, it has in general extended further in self-compatible species, than in those that are out-breeding(Flint-Garcia et al. 2003; Rafalski and Morgante

2004). The potential for higher resolution mappingwould therefore be expected in the latter species.

The important temperate forage and amenity grassLolium perenne is an obligate out-breeding species(Cornish et al. 1979). One would thus expect LD todecay to insignificant levels over short distances. Theonly data on LD in L. perenne come from an AFLPmarker analysis of populations, in which the resolutionwas limited by the 2–3-cM resolution of the F2 mappingfamily onto which the markers were mapped (Skøt et al.2005), and from preliminary data on the alkaline invertase

Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AM489608 andAM489692bb

1Corresponding author: Plant Genetics and Breeding Department, Instituteof Grassland and Environmental Research, Plas Gogerddan, Aberystwyth,Ceredigion SY23 3EB, United Kingdom. E-mail: [email protected]

Genetics 177: 535–547 (September 2007)

Page 2: Association of Candidate Genes With Flowering Time and Water … · 2007. 9. 20. · that LpHD1 is involved in the photoperiodic control of thefloweringphenotype.Wehavethereforeusedthisto

being analyzed further in this work (Humphreys et al.2006). In the absence of more extensive informationof the extent of LD within genes and given the obli-gate out-breeding nature of the species, it seemed rea-sonable to assume that LD decays rapidly. A candidategene approach to association mapping thus appearedmost likely to be successful, since a genome-wide ap-proach would require an excessive number of molecularmarkers to be certain of identifying markers in LD withany given QTL allele. A careful selection of target geneswith a probable role in controlling a phenotype is morelikely to lead to identification of useful markers associ-ated with a trait.

Population structure also has a major effect on LD. Itis influenced by population genetic forces such as drift,selection, population admixture, and gene flow (Gaut

and Long 2003). The selection of populations for asso-ciation mapping is therefore an important issue in termsof capturing the maximum variation in the trait of inter-est, while minimizing effects of population structure.

Here, we are focusing on two traits in L. perenne:flowering time or heading date (HD) and content ofwater-soluble carbohydrates (WSCs). Both traits are offundamental importance for plant growth and develop-ment and affect traits of practical and economic signi-ficance in forage and turf grass breeding. In particular,HD has an impact on biomass production, persistency,and quality including WSC (Wilkins and Humphreys

2003). Both traits have a high degree of heritability, andhigh sugar grass varieties have been bred (Wilkins andHumphreys 2003). However, there is still a needfor further improvement in this trait, and there are anumber of genes that could be targeted as candidates,particularly those involved in fructan biosynthesis orbreakdown and enzymes involved in sugar metabolismsuch as invertases (Gallagher et al. 2004; Chalmers

et al. 2005). Here we use a cytosolic neutral/alkalineinvertase (Gallagher and Pollock 1998), which hasbeen mapped to a QTL on chromosome 6 for glucoseand fructose content in L. perenne (Turner et al. 2006).This LpcAI gene encodes an enzyme that hydrolysessucrose to produce glucose and fructose, the substratesfor respiration and biosynthesis of primary and second-ary compounds as well as regulation of gene expressionby sugars (Gallagher and Pollock 1998; Gallagher

et al. 2004). Expression of this neutral/alkaline invertasegene is more or less constant in response to a number ofperturbations including variation in sucrose substrateconcentration, light, and position in the leaf i.e., age oftissue (Gallagher and Pollock 1998).

The genetic control of flowering in L. perenne is mainlydetermined by day length and temperature. Short daysand low temperatures (vernalization) are required as aprimary induction, followed by longer days and highertemperatures. There is however, a large degree of geneticvariability for this trait within L. perenne. Orthologousgenes to some of those involved in the photoperiod-

controlled flowering induction in the model species A.thaliana have been identified in rice (Yano et al. 2000)and forage grasses (Armstead et al. 2004; Armstead

et al. 2005). In particular, the HD1 homolog of theCONSTANS gene in A. thaliana (Putterill et al. 1995) islocated on chromosome 7 in L. perenne within a majorQTL for flowering time ( Jones et al. 2002; Armstead

et al. 2004). Its expression is upregulated in response tolong days, and it is capable of complementing a mutantCONSTANS line in A. thaliana (Martin et al. 2004).These pieces of evidence are all consistent with the ideathat LpHD1 is involved in the photoperiodic control ofthe flowering phenotype. We have therefore used this tosearch for allelic variants associated with HD. The im-portance of population structure is illustrated in the as-sociation analysis described here in which populationsfrom throughout Europe were selected to maximize var-iation in HD (and to some degree also WSC) and analyzedfor association of these traits with allelic variation in twocandidate genes, LpHD1 and LpcAI, respectively.

MATERIALS AND METHODS

Plant material: In total, 96 genotypes from each of ninepopulations of L. perenne were used in this work, of which sevenare natural or seminatural and two were varieties. Thepopulations all originate from Europe. They were primarilyselected to provide the maximum possible variation in HD anddetails of their origin are listed in Table 1. Second, whereverpossible, populations within the same flowering time category,but from more than one distinct geographic origin, wererepresented. This was done to minimize the risk of spuriouscorrelations with latitude. Third, populations with variation inHD are also likely to vary in WSC content, so have the potentialto provide useful material for association analysis of foragequality traits. Seeds were planted in 6-in. diameter pots inpotting compost in 2003, and plants left in a polytunnel tovernalize. HD was recorded in 2004 while still maintained assingle genotypes in individual pots in the polytunnel. Afterflowering, above ground plant material was harvested, dried,and prepared for Near Infrared Reflectance Spectroscopy(NIRS) analysis of WSC, nitrogen content (N), and dry matterdigestibility (DMD) as described (Lister and Dhanoa 1998).Tillers were planted as spaced plants in a field near theInstitute of Grassland and Environmental Research in a fullyrandomized design in two replicates. The following year HD,WSC, N, and DMD were recorded or measured as in 2004. TheHD data for 2005 represent the mean of two replicates.

DNA, sequencing, and SNP analysis: Extraction of DNA wasperformed as described previously (Skøt et al. 2005). Se-quencing was carried out using an ABI 3100 genetic analyzeraccording to the manufacturer’s instructions (Applied Bio-systems, Warrington, UK). Primers for amplification of PCRfragments for sequencing within the LpHD1 and LpcAI arelisted in Table 2. The primers were designed from sequencesdeposited in the EMBL/GenBank data libraries under acces-sion numbers AM489608 and AM489692, respectively. A totalof 5604 bp of the LpHD1 locus was resequenced, correspond-ing to base pair numbers 10720–16323 in AM489608. Thisincluded a putative peroxidise precursor gene located up-stream of LpHD1, as well as the sequence between the twogenes and exon 1 of LpHD1. Despite numerous primer designsand PCR reaction modifications, we were unable to obtain

536 L. Skøt et al.

Page 3: Association of Candidate Genes With Flowering Time and Water … · 2007. 9. 20. · that LpHD1 is involved in the photoperiodic control of thefloweringphenotype.Wehavethereforeusedthisto

reliable PCR amplification for SNP discovery in the 39 regionof the LpHD1 gene, including the second exon. In the LpcAIlocus, three segments were resequenced: base pair numbers61–1263, 1927–2730, and 4635–5465, covering a total of 2796bp (starting at base number 701 in AM489692). The se-quenced segments cover �800 bases upstream of the codonstart site and exon 1, then exons 2 and 3, and finally exons 5and 6. This resequencing strategy was used to enhance thechances of detecting SNPs in the upstream regulatory se-quence and coding sequence. The PCR reactions were per-formed in a total volume of 50 ml containing 30 ng DNA, 13Roche Taq buffer with Mg21, 200 mm of each dNTP, 200 mm ofeach primer, and 0.05 units of Taq polymerase (Roche Diagnos-tics, West Sussex, UK). The PCR reactions were carried outusing an ABI 9700 thermocycler (Applied Biosystems). Theconditions were as follows: a 2-min 94� denaturation stepfollowed by 40 cycles of 94� for 30 sec, annealing temperaturefor 30 sec, and 72� for 1 min, followed by a final 7-min exten-sion step at 72�. The annealing temperature varied between

55� and 60� depending on the melting temperature of thespecific primer pair.

The discovery of SNPs in the LpHD1 gene was performed bysequencing a subset of two genotypes from each of the ninepopulations. The eight SNP loci selected for the full data setwere analyzed for polymorphism using the TaqMan assay.Primers and fluorescent probes were designed using thePrimer Express version 2 program (Applied Biosystems) andare listed in Table 3. The allelic discrimination assay wasperformed using the ABI 7500 Real Time PCR system (AppliedBiosystems), with the default settings on the PCR program.The reaction mix consisted of 13 Taqman universal buffer,0.9 mm of each primer, 0.1 mm of each probe, and 10 ng ofgenomic DNA. For SNP discovery in the LpcAI gene, 19genotypes were used, 7 of which were from the 9 populationsdescribed here. The remaining 12 were genotypes from dif-ferent populations previously described (Skøt et al. 2005). LDand neutrality tests were performed on these two subsets usingthe program DnaSP (http://www.ub.es/dnasp) (Rozas et al.2003). Due to the heterozygous nature of the sequence data,haplotype pairs were inferred using the PHASE version 2.0program (Stephens et al. 2001; Stephens and Donnelly

2003), so that two sequences were entered for each individual.Fisher’s exact test was used to calculate the significance of pair-wise LD, and Tajima’s D test was used to estimate neutrality ofthe SNP polymorphisms. Fifteen of the 92 SNPs in the LpcAIgene were selected for subsequent genotyping in 450 of the864 genotypes (50 from each population). Three factorsdetermined which SNPs were selected: the availability of asufficient length of monomorphic sequence to allow primerdesign, inclusion of amino acid changing polymorphismswherever possible, and representation of all the inferredhaplotypes. The genotyping analysis was carried out by K-Biosciences (Hoddesdon, UK).

AFLP analysis was performed as described (Skøt et al.2005), except that an ABI 3130xl Genetic analyzer (AppliedBiosystems) was used to separate the fluorescently labeledfragments, and Genemapper version 3.7 (Applied Biosystems)rather than Genotyper version 3.7 was used to analyze the data.

Data analysis: The AFLP molecular marker data wereanalyzed for basic population genetics parameters includinggenetic diversity and population differentiation using themethod of Lynch and Milligan (1994) as implemented inthe AFLP-SURV 1.0 program (Vekemans 2002). Markers fromindividual primer pairs were analyzed separately to avoid thecreation of too large data files. Since AFLP markers aredominant, we had to assume Hardy–Weinberg equilibrium.Analysis of variance (ANOVA) and linear mixed modelanalysis was performed using Genstat Release 8.11 (http://www.vsni.co.uk). In the latter, SNP genotypes were fitted as

TABLE 1

List of L. perenne populations used in this work

Accession Accession status Latitude Longitude MASL Flowering time

Ba9955 Semi-natural 52�569N 3�039W 100 Very lateBa10113 Semi-natural 61�109N 6�409E 75 IntermediateBa10158 Semi-natural 51�419N 9�279W 70 Very earlyBa10278 Semi-natural 47�209N 9�259E 830 Very earlyBa10284 Semi-natural 47�279N 8�519E 720 Very earlyBa10732 Semi-natural 53�349N 1�349W IntermediateBa10870 Variety Very lateBa11304 Variety Very earlyBa12945 Semi-natural 52�249N 6�499W 90 Late

MASL, meters above sea level.

TABLE 2

Primer sequences used for PCR amplification of segmentsof the two genes LpHD1 and LpcAI

Gene PairPrimername Primer sequence (59–39)

LpHD1 1 HD1F CAGAATGAAACAGGTGCTGAHD1R AGGAATAGGCCAGGTTCATT

2 HD2F TGTTTGCTAGGTCAAGACTTGCHD2R TGAAGCCACCAAACACTG

3 HD3F AGCAAGCAGAAAGTATCTGTAGHD3R TTCCTCGGGTATTTTGATC

4 HD4F CTGACGGGGATAAGATATTTTCHD4R GTTTTTTTGCCATTCATTGG

5 HD5F ACTATCTAGTGACATGGCATGGHD5R TAAAGAAGCAGTCGGAATGG

6 HD6F CAAGCCACAAGGCCTCCTTHD6R GGAGTGGCTATGACGCAGTTCT

LpcAI 1 AI1F GCTTTTCTGTTAGCCCAATGAI1R AAGGTCATCAATCCTCTGGC

2 AI2F GCATCCAGTTTACCCCTCAGAI2R CGTACAATTTCATACTCCCCC

3 AI3F TTCCATCCTCTCTTCCCTTCAI3R CATGGCTTGACTTTTGACAAAC

4 AI4F GGCAACAACCCAACAATCACAI4R GCATTCCTCCATCAAAAGCAC

Association Mapping in Lolium 537

Page 4: Association of Candidate Genes With Flowering Time and Water … · 2007. 9. 20. · that LpHD1 is involved in the photoperiodic control of thefloweringphenotype.Wehavethereforeusedthisto

fixed terms, and population structure was incorporated byfitting inferred clusters as a random term. The populationstructure was estimated using the program STRUCTUREversion 2.1 (Pritchard et al. 2000). Since the input data forthat analysis consisted of dominant AFLP markers a no-admixture model was employed, in which allele frequencieswere considered independent among populations. Onlymarkers with a band frequency $0.05 were used. The lengthof the burn-in period and the number of MCMC replicationsafter the burn-in was 50,000 for each. The given number ofpopulations (K) was varied between 2 and 10. This choice wasbased on the fact that the genotypes consisted of sevengeographically distinct populations within Europe, and theremaining two populations were varieties developed at theInstitute of Grassland and Environmental Research (Table 1).

Haplotypes in the two genes were inferred using a Bayesianapproach implemented in the program PHASE version 2.0(Stephens et al. 2001; Stephens and Donnelly 2003). Thedefault settings were used since the haplotype frequenciesand the goodness-of-fit measure were consistent betweenruns. The inferred diplotypes obtained from running thisprogram was used as input in the program TREESCANversion 0.9 (http://darwin.uvigo.es) (Templeton et al. 2005).This software was used to perform a tree-scanning analysis ofthe phenotypic data against haplotype trees constructed fromthe haplotypes inferred in PHASE 2.0. The list of haplotypesin the best reconstruction from the PHASE version 2.0 outputfile was used for the construction of the haplotype trees,employing the phylogenetic program PHYLIP version 3.6(Felsenstein 1993) using maximum parsimony. The defaultsetting was used in which the first haplotype was used as theoutgroup root. In the execution of the TREESCAN programthe probability threshold was set at 0.05 for the correctedpermutational P-value after enforcement of monotonicity.The number of permutations was 1000, and the minimumclass size was set to 5.

RESULTS

Phenotype analysis: The phenotypic data for all fourtraits are summarized in Table 4. The two-way ANOVA

showed that there were not only significant differencesbetween populations and years, but also population 3

year interaction for all four response variables (Table 5).The difference in phenotype between years, particularlyfor the three quality traits, can be attributed partly to thecontrasting plant growth conditions (pots vs. field).Second, a larger proportion of the harvested plantmaterial from the pots probably consisted of leaf basescompared to leaf blades, than that from the field. Theleaf bases contain significantly more WSC than theblades (Gallagher et al. 2004, 2007). The HD pheno-type values agreed in general with previous classifica-tions of the accessions as shown in Table 1, but also here

TABLE 3

Primers and fluorescently labeled probes that were used in the TaqMan assay of the eight SNP polymorphismsassessed in the LpHD1 locus

SNP Forward and reverse primers (59–39) TaqMan probe

320 ACCACATCTCCTGGTCAGAGTTG AGGCC½C/T�TCGCCCTACCTTCGATAACGCTTATTTGCA

513 ATGTCATCTCGAAGTCAC CCCG½G/A�TGCGCCAATTAAGGATCCCACACGA

1475 ATCTCCGTCCGGACCACAT TTCCACA½T/A�TCGGGCGAACCTGAGCCTGGAGCACTA

2118 GCCTCATATGTTCCTTTGTTAGATCA TGAACA½C�GTGAAATTGATGGATGCATCTGTGTTAACACTT CCATGAACA½T�GTGAAAT

2389 ATCTGGAAAGGAAAGAGACATGAGG AGACAGCTA½C�AGTACAATGCACCATATCCTCCTTGGC CAGACAGCTA½T�AGTACAAG

4443 GACACTCTACTATTAGTACCCTGCACTGA ACTGCCAA½C�ATAGCCTGCATATGTGAGTGTGGAGGAA ACTGCCAA½A�ATAG

4717 CAGGCTGTGTGATGGATGTTG TGCCAAGC½G/A�TGGTGTATCACAAGACGCGCAGAGGTAT

5443 CAACAACAGCGTGAGTTCATCTATT CATGCTATAATTT½G�GTTAATGGTCTTTCAAATGAAGTAGCATTAACTC TCATGCTATAATTT½T�GTTAAT

TABLE 4

Phenotypic data for the nine L. perennepopulations used in this work

Year

2004 2005

Population HD WSC N DMD HD WSC N DMD

Ba9955 110.6 21.1 2.5 73.1 104.3 10.5 3.5 64.9Ba10113 81.3 15.7 2.7 74.9 69.9 8.6 3.7 59.9Ba10158 68.9 20.7 2.3 73.4 63.8 10.2 3.5 62.6Ba10278 81.2 19.9 2.4 72.3 50.3 10.1 3.7 66.2Ba10284 70.2 22.0 2.7 76.2 47.8 10.9 3.4 63.8Ba10732 91.9 17.7 2.7 75.3 80.9 10.1 3.2 62.6Ba10870 110.2 21.0 2.5 73.9 104.0 11.5 3.4 65.2Ba11304 72.1 25.6 2.6 74.6 41.0 10.0 3.6 64.6Ba12945 107.4 22.8 2.6 76.8 101.2 10.6 3.4 64.7LSD (0.05) 1.05 2.15 0.23 1.24 1.03 1.31 0.16 1.80

LSD, least significant difference (P , 0.05); HD, days afterMarch 1st; WSC, N, and DMD, % of dry matter. For the HDanalysis N ¼ 96 genotypes per population; for WSC, N, andDMD analyses N ¼ 50.

538 L. Skøt et al.

Page 5: Association of Candidate Genes With Flowering Time and Water … · 2007. 9. 20. · that LpHD1 is involved in the photoperiodic control of thefloweringphenotype.Wehavethereforeusedthisto

there was significant population by year interaction(Table 5). Particularly the very early flowering popula-tions Ba10278, Ba10284, and Ba11304 were variablebetween years.

LD in candidate genes: The degree of polymorphismin the LpcAI and LpHD1 genes differed greatly, as 92were found in the former and only 12 in the latter (Table6). This difference is further underlined by the fact that2796 bp were sequenced for SNP discovery in the LpcAIlocus and 5604 bp in LpHD1. Maps of the two loci areshown in Figure 1 including the SNP polymorphismsanalyzed for association with phenotypes. The strategyfor selecting the 15 SNPs in the LpcAI gene (Figure 1A)was described in materials and methods. Figure 1Bshows that the LpHD1 locus also included a putativeperoxidise precursor-like gene. We had no a priorireason to believe it is involved in the control of floweringtime. We included SNPs from this gene, as it mightinform us on the extent of LD in the region. The overalland within-population allele frequencies at the loci usedin the association analysis are shown in Table 7. All SNPswere polymorphic overall in both loci, but some weremonomorphic within some populations, particularly in

the LpHD1 locus. Nucleotide diversity (p) in LpcAI wasdetermined for each of the three segments that wereresequenced. In the first segment (61–1263 bp) p ¼0.01138; in the second (1927–2730 bp) p ¼ 0.00822,and in the third (4635–5465 bp) p ¼ 0.00605. TheTajima’s D values were 0.1974, 0.3593, and 1.0318,respectively. All three were nonsignificant (P . 0.10),indicating that there was no evidence to suggest asignificant deviation from neutrality. However, in onelocalized window of the sequence (939–1038 bp) the

TABLE 5

Two-way ANOVA of phenotypic data collected from pot and field experiments

Mean square Variance component estimates (%)

Source of variation d.f. HD HD

Population 8 83,604.6*** 437.39 (73)Year 1 90,824.6*** 105.59 (18)Population 3 year 8 5,524.6*** 56.68 (9)Residual 1700 115.5Total 1717

Mean square Variance component estimates (%)

Source of variation d.f. WSC N DMD WSC N DMD

Population 8 266.8*** 0.96*** 87.3*** 2.5 (4) 0.01 (2) 0.73 (1)Year 1 24,186.9*** 200.23*** 25,291.9*** 54.3 (92) 0.45 (94) 56.74 (93)Population 3 year 8 155.7*** 1.15*** 185.7*** 2.7 (4) 0.02 (4) 3.44 (6)Residual 873 20.52 0.25 15.4Total 890

Nine populations were used, and data were collected for 2 years. For the HD data, 96 genotypes per population were analyzed(864 in total), for the quality data 50 genotypes per population (450 in total) were analyzed. ***P , 0.001.

TABLE 6

Summary of SNP and indel polymorphisms in the two lociLpcAI (cytosolic alkaline invertase) and LpHD1 (HD1)

Polymorphism type LpcAI LpHD1

Noncoding (indels) 62 (9) 8 (0)Coding 21 4Synonymous 13 2Replacement 8 2Total 92 12

Figure 1.—Structure of the LpcAI and LpHD1 loci investi-gated in this work. The black rectangular boxes indicateexons, and arrows represent the SNPs investigated here. (A)Alkaline invertase gene. The upside down triangle indicatesa 67-bp indel in the 59 UTR region. M825L and D1053N indi-cate amino acid changing polymorphisms. (B) The HD1 gene.The arrowed rectangles indicate the direction of transcriptionof the peroxidise-like gene and the HD1 gene. N1475I andV4717M indicate amino acid changing polymorphisms.

Association Mapping in Lolium 539

Page 6: Association of Candidate Genes With Flowering Time and Water … · 2007. 9. 20. · that LpHD1 is involved in the photoperiodic control of thefloweringphenotype.Wehavethereforeusedthisto

Tajima’s D value was 2.25, which was significant (P ,

0.05), suggesting an excess of intermediate allele fre-quencies. In the LpHD1 gene a continuous segment of5603 bp was resequenced. The nucleotide diversitywas p ¼ 0.00500, and Tajima’s D ¼ 0.8795 (P . 0.10),also indicating a trend toward excess of intermediateallele frequencies, but the effect was not statisticallysignificant.

The pattern of LD in the two genes is shown in Figure2. In the LpcAI gene LD decayed to below 0.2 within 1–2kb, although there was still significant LD between someloci at larger distances (Figure 2A). Of the 3655 pair-wise comparisons 996 were significant (P , 0.05), and ofthose, 177 were still significant after Bonferroni correc-tion for multiple testing. The observed P-values wereplotted against the expected P-values, expressed as�log(i/(L 1 1)), where i is the ith smallest P-value,and L is the number of pair-wise comparisons. Thevalues deviated significantly from the expected 1 to 1ratio (L. Skøt, unpublished data), suggesting thatpopulation structure or other systematic forces wereinfluencing the result (Balding 2006). Nevertheless,Figure 2A shows that there are many significant pair-wiseLD values at distances .4000 bp. The small number ofSNPs in the LpHD1 locus made it difficult to draw firmconclusions about decay of LD with distance. Only 9 ofthe 12 polymorphisms were included in this analysis, as

the remaining 3 were singletons. Of the 36 pair-wise com-parisons, 8 were significant before Bonferroni correc-tion, and 2 were significant after. Within-population LDpatterns in the LpHD1 locus is summarized in Figure 3.They show that the proportion of locus-pairs in LDtended to be largest in Ba9955, Ba10732, Ba10870, andBa12945, all intermediate to very late flowering pop-ulations (see Tables 1 and 4).

Population structure: The population structure wasinvestigated using AFLP markers. A total of 506 markerswith a band frequency $0.05 were produced fromamplification with six selective primer pair combina-tions. They were analyzed with the AFLP-SURV version1.0 software for basic population genetics parametersincluding gene diversity (expected heterozygosity) andF statistics (population differentiation). The results aresummarized in Table 8 and show that within-populationheterozygosity accounted for 85–91% of the totalheterozygosity, which is consistent with many previousassessments in L. perenne. This confirms previous workthat within-population genetic diversity is generallymuch larger than between populations in this species(Roldan-Ruiz et al. 2000; Cresswell et al. 2001; Skøt

et al. 2005). The FST values indicate that�10–15% of thetotal genetic variation is due to population structure.The presence of population substructure is not sur-prising, given the diverse geographic origins and the

TABLE 7

Overall and within-population allele frequencies of the SNP polymorphisms investigated in this work

SNP Overall Ba9955 Ba10113 Ba10158 Ba10278 Ba10284 Ba10732 Ba10870 Ba11304 Ba12945

LpHD1320(C/T) 0.88 0.95 0.82 0.88 0.86 0.79 0.88 0.92 0.93 0.88513(C/T) 0.95 0.86 0.97 0.95 0.96 0.79 1.00 1.00 1.00 1.001475(T/A) 0.79 0.70 0.96 0.80 0.81 0.60 0.75 0.67 0.86 0.922118(T/C) 0.62 0.57 0.23 0.92 0.78 0.92 0.36 0.68 0.70 0.372389(C/T) 0.75 0.78 0.50 0.49 0.65 0.98 1.00 0.79 0.62 0.904443(C/A) 0.81 0.58 0.90 0.99 0.97 1.00 0.72 0.77 0.98 0.434717(G/A) 0.97 1.00 0.77 1.00 1.00 1.00 1.00 1.00 1.00 1.005443(G/T) 0.79 0.70 0.95 0.80 0.82 0.60 0.77 0.66 0.86 0.92

LpcAI202(G/A) 0.61 0.30 0.64 0.68 0.75 0.87 0.53 0.39 0.85 0.37382(C/T) 0.50 0.15 0.58 0.58 0.66 0.56 0.53 0.34 0.78 0.34492(I/D) 0.48 0.14 0.58 0.55 0.57 0.55 0.53 0.33 0.76 0.32805(T/C) 0.87 0.97 0.80 0.89 0.65 0.78 1.00 0.98 0.79 0.97825(A/T) 0.60 0.29 0.62 0.67 0.74 0.86 0.61 0.45 0.84 0.36953(T/C) 0.51 0.14 0.57 0.64 0.71 0.56 0.53 0.43 0.77 0.281053(A/G) 0.57 0.15 0.62 0.66 0.74 0.81 0.60 0.42 0.81 0.341954(C/G) 0.89 0.86 0.94 0.93 0.97 0.69 0.90 0.96 0.94 0.921970(T/G) 0.53 0.16 0.61 0.63 0.59 0.73 0.59 0.33 0.76 0.282214(C/T) 0.91 0.87 0.95 0.93 0.97 0.68 0.90 0.94 0.94 0.972283(G/A) 0.52 0.15 0.62 0.61 0.62 0.73 0.57 0.34 0.78 0.292647(T/C) 0.62 0.29 0.64 0.71 0.76 0.89 0.63 0.49 0.83 0.354879(G/A) 0.92 0.79 0.94 0.96 0.98 1.00 0.95 0.84 0.95 0.865259(G/C) 0.52 0.86 0.42 0.45 0.57 0.44 0.47 0.68 0.25 0.715395(A/G) 0.52 0.86 0.41 0.45 0.43 0.44 0.47 0.67 0.24 0.71

The frequencies refer to the first allelic variant at each locus. I/D, insertion/deletion.

540 L. Skøt et al.

Page 7: Association of Candidate Genes With Flowering Time and Water … · 2007. 9. 20. · that LpHD1 is involved in the photoperiodic control of thefloweringphenotype.Wehavethereforeusedthisto

deliberate selection of accessions with the widest possi-ble range of variation in flowering time (see Table 1).

The AFLP marker data were also used in a more detailedanalysis of population structure with the STRUCTUREversion 2.1 software program. Of the 506 markers, 73were also polymorphic in an F2 mapping family de-scribed elsewhere (Turner et al. 2006). While AFLPmarker distribution can be clustered (Bert et al. 1999),we found that the 73 mapped markers were distributedfairly randomly on each of the seven linkage groups of L.perenne (between 6 and 17 per linkage group). Ifextrapolated to the unmapped markers, it would bebetween 40 and 120 per linkage group. Data represent-ing AFLP markers for each primer pair were analyzedindividually, but they all gave similar results. When weincluded a prior assumption of nine populations, and Kwas varied from 2 to 10, the number of inferred clusterswith the highest probability was eight or nine, depend-ing on the primer pair, and they coincided well with the

nine given. Moreover, for each genotype, there was littleevidence of ancestry from more than one inferredcluster. An almost identical result was obtained withouta priori assuming the presence of nine populations.When each of the nine populations were analyzedseparately, with K varied from 2 to 6, the proportion ofancestry from the different clusters was approximatelyevenly distributed between the clusters, suggesting littleor no population structure within the nine populations(see documentation for STRUCTURE software version2). The within-population analysis was based on fewerpolymorphic markers than 506 owing to absence ofmarkers in some populations. Nevertheless, it was stillbetween 325 and 418, depending upon the population,sufficient to detect within-population structure, if it waspresent. Taken together, these results led us to theconclusion that the 864 genotypes were clustered innine groups, coinciding with the nine accessions used inthis work.

Association analysis: An initial association analysiswas performed using one-way ANOVA without takingpopulation structure into account. There was no signif-icant association between any of the SNPs in the LpcAIgene and WSC, N, or DMD in 2004, but four loci asso-ciated with WSC and DMD in 2005 (SNPs 1970, 2283,5259, and 5395). The latter trait also associated with twofurther loci in the 59 untranslated region of the gene(SNPs 382 and 492) (P , 0.05). There was highly sig-nificant association between HD and three SNPs in theLpHD1 locus in both years (SNPs 2118, 2389, and 4443)(P , 0.0001). The three SNPs are located in the inter-genic region between the putative peroxidise precursorgene and LpHD1. One of them (4443) is located 265 bpupstream of the translational start site of LpHD1.

The presence of population structure makes it likelythat some of these associations are spurious. Three strat-egies were used to correct for this. First, a one-wayANOVA was performed on data from individual pop-ulations separately, assuming no within-population sub-structure. In the LpcAI locus, seven SNPs associated withWSC in Ba11304 in 2004 (Figure 4A). Six of those SNPsassociated with WSC in Ba10278. Five SNPs were alsosignificant in Ba10158 in 2005, but only SNP_382 wassignificant in all three populations. For DMD, 11 SNPSwere significant in Ba10732 in 2004. Five of those werealso significant in Ba10284, and six were significant inBa10158 in 2005 (L. Skøt, unpublished data). A moreconsistent pattern emerged from the association anal-ysis of the LpHD1 gene (Figure 4B). First, there were alarge number of associations with HD in the Ba9955population (six in 2004 and five in 2005, with a highdegree of overlap). Second, the SNPs 4443 and 5443were both significantly associated with HD in sixsamples. Both SNPs are located in the HD1 gene orimmediately upstream, rather than in the putativeperoxidase precursor gene. Furthermore, the degreeof significance was particularly high for the 4443 SNP

Figure 2.—Pattern of LD in the LpcAI (Alkaline invertase)(A) and the LpHD1 (HD1) (B) loci, as detected in the test setof genotypes used for resequencing. The r2 value for pairwiseLD is plotted against physical distance.

Association Mapping in Lolium 541

Page 8: Association of Candidate Genes With Flowering Time and Water … · 2007. 9. 20. · that LpHD1 is involved in the photoperiodic control of thefloweringphenotype.Wehavethereforeusedthisto

(P¼ 0.0078 for Ba9955 in 2004, and P¼ 0.0001 in 2005;for Ba10732 in 2005 P ¼ 8.71 3 10�8).

The second strategy consisted of a linear mixedmodel analysis of the data, in which population wasincorporated by including the nine inferred groups(i.e., populations) as random effects. Only one SNP(2647) in the LpcAI gene associated with WSC in 2004(P¼ 0.05), but not in 2005. In the LpHD1 gene however,SNP 4443 was significantly associated with HD in both2004 and 2005 (P ¼ 0.05 and P , 0.001, respectively).

Recently, the use of haplotype trees has been advo-cated for the analysis of genotype/phenotype associa-tions, as they have the potential to uncover associationswith extended haplotypes, particularly if the level of LDis significant (Buntjer et al. 2005; Templeton et al.2005). Since there was evidence of significant LD inboth loci investigated here over the whole of each gene,we carried out the tree-scanning analysis as described byTempleton et al. (2005) as the third analysis. Theheterozygous nature of L. perenne meant that phases

Figure 3.—Within-population LD pattern inthe LpHD1 locus. The top right diagonal rep-resents the LD expressed as r2. Black squares,r2 $ 0.5; dark gray squares, 0.2 # r2 # 0.5; lightgray squares, 0.05 # r2 # 0.2; white squares,r2 # 0.05. The bottom left diagonal representsFisher’s exact test probabilities. Black squares,P # 0.0001; gray squares, 0.0001 # P # 0.001;white squares, 0.001 # P. Gray and black squaresin the bottom left diagonal are all significant (P ,0.05) after Bonferroni correction for multipletesting.

TABLE 8

Basic population genetic data from the nine populations

Primer pair Loci HT HW HB FST

ACA_CAC 89 0.2403 0.2118 (0.0051) 0.0285 (0.0019) 0.1184 (0.0594)ACA_CTA 113 0.2906 0.2596 (0.0081) 0.0309 (0.0015) 0.1064 (0.0575)ACT_CAC 65 0.2298 0.2057 (0.0057) 0.0241 (0.0027) 0.1050 (0.0975)ACT_CTA 104 0.2665 0.2410 (0.0053) 0.0255 (0.0018) 0.0958 (0.0648)ACG_CTA 56 0.2604 0.2202 (0.0072) 0.0401 (0.0049) 0.1543 (0.1131)ACG_CTG 79 0.2399 0.2181 (0.0058) 0.0218 (0.0000) 0.0908 (0.0140)

A total of 864 genotypes derived from AFLP marker data. The genetic diversity and population subdivisionmeasures follow the notation of Lynch and Milligan (1994). HW is the within-population genetic diversity, HB

is the between-population diversity, and HT ¼ HW 1 HB. FST ¼ HB/HT is Wright’s measure of population sub-division. Numbers in brackets are standard errors based on 1000 permutations.

542 L. Skøt et al.

Page 9: Association of Candidate Genes With Flowering Time and Water … · 2007. 9. 20. · that LpHD1 is involved in the photoperiodic control of thefloweringphenotype.Wehavethereforeusedthisto

had to be inferred. This was performed using theprogram PHASE version 2.0 as described above andresulted in 17 haplotypes over the eight loci of theLpHD1 gene (Table 9). They were used to produce amaximum parsimony tree in the program PHYLIP.Changing the haplotype used as an outgroup root didnot alter the outcome of the association analysissignificantly. The information from this plus the mostlikely diplotypes of the 864 genotypes and the pheno-type data were all used in the TREESCAN program.The result of the first round of tree scanning for thewhole data set is shown in Figure 5. The significantbranch points are connecting the same haplotypes inboth of the alternative haplotype trees. For reasons ofsimplicity we will therefore focus on the results of thetree-scanning analysis of the first tree. Figure 5 illus-trates the consistency of the results between the 2 years.The only discrepancy was the transition between haplo-type 4 and 7, which was only significant in 2004. Thesignificant branch points all involve three polymor-phisms, namely 2118, 2389, and 4443. The single ex-ception is the branch between haplotype 2 and anintermediate haplotype, not present in the sample,joining haplotype 9 in the second haplotype tree(Figure 5). This is the 1475 SNP, which changes anasparagine to isoleucine in the first exon of the putativeperoxidase precursor protein (Figure 1B). The 2118and 4443 SNPs were one of only two pair-wise SNPs,

which were in significant LD after Bonferroni correc-tion for multiple testing. These three polymorphismswere also highly significantly associated with HD in thesingle SNP ANOVA test. However, this tree-scanninganalysis was performed on all 864 genotypes withoutconsideration of population structure. We therefore didthe analysis again on each population separately. Table10 shows that Ba9955, Ba10732, and Ba10870 hadsignificant associations at branch points involving the2118 and 4443 polymorphisms, but not 2389 in all threepopulations. In addition, Ba9955 was significant at thebranch between haplotypes 10 and 14 (polymorphism513) in 2004. The reduced number of significant branchescould be caused by loss of power due to smaller numberof genotypes in each analysis, and/or by false positives inthe analysis of the full data set ignoring populationstructure.

In the LpcAI gene the PHASE program identified 37haplotypes in the best reconstruction of the sample.However, the phylogenetic ambiguity of the data meantthat it could be resolved in 75 possible haplotype trees. Asubset of 10 randomly chosen trees was used in theTREESCAN analysis, and in every case the same branchbetween haplotype 7 and 31 was significant for WSCin 2004 and N in 2005. This involves the SNP_202polymorphism (Figure 1A). When the analysis wasperformed on individual populations no significantbranches were found.

Figure 4.—One-way ANOVA of asso-ciation of WSC (A) with SNPs in theLpcAI gene and HD (B) with SNPs in in-dividual populations in 2004 and 2005.Phenotypic values represent the re-sponse variate, and genotypes are the in-dependent variables. The numbers 4and 5 in A represent years. The shadingrepresents different levels of signifi-cance: Light gray, P , 0.05; intermedi-ate gray, P , 0.01; black: P , 0.001.NA, not analyzed owing to monomor-phism at that locus.

Association Mapping in Lolium 543

Page 10: Association of Candidate Genes With Flowering Time and Water … · 2007. 9. 20. · that LpHD1 is involved in the photoperiodic control of thefloweringphenotype.Wehavethereforeusedthisto

DISCUSSION

Polymorphism in LpcAI and LpHD1: The largedifference between LpcAI and LpHD1 in the numberof polymorphisms (Table 6) and nucleotide diversity,particularly in the first segment of the LpcAI gene, mayat first glance suggest that the LpHD1 gene is function-ally more important and has been subject to strongerselective pressure than the LpcAI gene. However, theproportion of nonsynonymous SNPs is actually higherin the LpHD1 locus (16.7%) than in LpcAI (8.7%).Admittedly, this comparison is based on very differenttotal numbers of SNPs in the two genes (Table 6).Nevertheless, the vast majority of SNPs in LpcAI arelocated in the 59 upstream region. It is also interesting tonote that the overall Tajima’s D value in one window(939–1038) of the LpcAI gene was significantly positive,while it was nonsignificant throughout the LpHD1 locus.The positive Tajima’s D values in both genes indicate atrend toward excess of intermediate frequency alleles.This effect can be caused by population bottlenecks,structure, or balancing selection (Biswas and Akey

2006). In view of the strong evidence for significantpopulation structure in the plant material used here,this may be a contributing factor.

The presence of a high degree of polymorphismraises the possibility that the PCR-based resequencing ofthe LpcAI gene may have amplified different genes of agene family. It is well established that temperate grasses,including L. perenne, have a number of closely relatedgenes involved in fructan and sucrose metabolism, in-cluding invertases (Gallagher et al. 2004; Chalmers et al.

2005). We therefore developed a sequence-characterizedRFLP marker on the basis of one of the SNPs, which waslocated in a SfoI restriction site. This mapped to thesame location as the original invertase RFLP in the F2

mapping family described by Turner et al. (2006). Inthe case of the LpHD1 locus, the sequencing was basedon a BAC clone isolated from a L. perenne library(Farrar et al. 2007) and identified on the basis of theS2539 marker primers, which span the unique putativeperoxidise precursor gene adjacent to the HD1 gene(Armstead et al. 2005). The synteny with the rice HD1locus on chromosome 6 was confirmed by furthersequencing of the BAC clone beyond the peroxidiseprecursor gene (accession no: AM489608).

Association analysis: Although there were significantassociations identified within populations, none of theanalyses identify consistent associations between thepolymorphisms in the LpcAI gene and WSC, N, or DMDphenotypes. A possible explanation could of course bethat there is no causal link between the allelic variants inLpcAI and these traits. Although the data from differentyears were analyzed separately to minimize the yeareffect, the significance of the population 3 year in-teraction is most likely a contributory factor.

TABLE 9

Haplotypes of the LpHD1 gene

Haplotype no. Haplotype Inferred no. in sample

1 CCTTTCGG 3422 CCTTCCGG 1463 CCTTCAGG 14 CCTCTCGG 445 CCTCTCAG 446 CCTCTAGG 17 CCTCCCGG 2598 CCTCCAGG 3159 CCATCCGG 310 CCATCCGT 27211 CCATCAGT 312 CCACCCGT 113 CTTTCCGT 114 CTATCCGT 8715 TCTTTCGG 716 TCTTCCGG 19917 TCATCCGT 3

Haplotypes of the LpHD1 gene and the number of timesthey are estimated to occur in the sample in the best recon-struction as implemented in the PHASE version 20 programand described in materials and methods.

Figure 5.—Haplotype tree of the SNP polymorphisms fromthe LpHD1 gene. The numbers connecting the haplotype no-des represent the position of the SNP polymorphism. The hap-lotypes are shown in Table 9. The circular and triangularsymbols indicate transitions that were statistically significantin the tree-scanning analysis in 2004 and 2005, respectively.The broken lines refer to the alternative branches of the sec-ond haplotype tree that could be resolved, because of the phy-logenetic ambiguity. The 0’s represent intermediate haplotypesnot present in the sample. The tree was obtained from the pro-gram Treeview (Page 1996) on the basis of the treefile outputfrom PHYLIP.

544 L. Skøt et al.

Page 11: Association of Candidate Genes With Flowering Time and Water … · 2007. 9. 20. · that LpHD1 is involved in the photoperiodic control of thefloweringphenotype.Wehavethereforeusedthisto

In the LpHD1 locus, both the ANOVA and tree-scanning analyses of the total set of 864 genotypesidentified SNPs 2118, 2389, and 4443 as highly signifi-cant. These results could be spurious due to populationstructure, but the linear mixed model, within-populationANOVA and tree-scanning analyses all identified the4443 SNP as significantly associated with HD. The lattertwo methods identified both SNP_2118 and SNP_4443as highly significant in the Ba9955 population, as well asSNP_4443 in Ba10732. The relative loss of power of thetree-scanning method compared to the within-popula-tion ANOVA of individual SNPs, due to the multipletesting issue (Templeton et al. 2005), probably explainswhy the ANOVA identified more significant associations(Table 10, Figure 4B). Nevertheless, the haplotype tree-based analysis has the potential to add a further di-mension to the association analysis by identifyingpotentially interesting haplotype clusters, which singleSNP analysis may miss. In this context it is interesting tonote that the PHASE and the tree-scanning analysesshow that, of the 320 haplotypes containing the A allelein the 4443 polymorphism, 315 were haplotype 8(CCTCCAGG) (Table 9). In contrast, the 544 genotypeswith the C allele were distributed over 13 haplotypes. Itmay suggest that this haplotype as a whole, rather thanjust the 4443-A allele, is associated with late flowering,illustrating the kind of result that the haplotype tree-based analysis has the potential to highlight. However,there are too few genotypes of the other 4443-A allelehaplotypes to verify this. As pointed out by Templeton

et al. (2005), the tree-scanning method distinguishesbetween the same allelic variant in different haplotypesand may thus be able to identify a significant specifichaplotype with the allelic variant, while the single SNP

analysis of the same allelic variant might be diluted by itspresence in other nonsignificant haplotypes. Thispotential advantage of tree-scanning disappears if theallelic variant in question is functional, since thehaplotype setting in that case is unimportant. There isas yet no evidence to suggest if the C-4443-A poly-morphism is functional or simply in LD with a func-tional variant.

The within-population ANOVA and tree-scanninganalyses show that individual populations differ in theirability to identify associations (Figure 4B, Table 10).This has potential implications for choosing whichpopulations to use for association mapping, althoughthere seems to be no clear pattern to guide the choice ofpopulations for analysis. Some of the differences be-tween populations can be attributed to small or zerominor allele frequencies (Table 7), but Figures 3 and 4Bshow that the populations in which most associationswere found with HD (Ba9955, Ba10732, and Ba10870)were all intermediate or late flowering and also tendedto have most locus pairs in significant LD. The Ba12945population is an exception by not having any locisignificantly associated with HD, but a high proportionof locus pairs are in LD, as well as being late flowering, sothe significance of this is not clear.

Aranzana et al. (2005) discuss the issue of discardinggenuine associations by accounting for populationstructure, due to association between the polymorphismand population stratification as well. Although all threeanalysis strategies, which took account of populationstructure, identified the 4443 polymorphism as signifi-cant in this work, Table 7 shows that allele frequencydifferences exist between populations, particularly inthe LpHD1 gene. The 4717 SNP was only polymorphic inthe Norwegian population Ba10113, and even in that, itwas only homozygous for the G allele or heterozygous(A/G). This SNP was included because it was the onlynonsynonymous polymorphism in the LpHD1 codingsequence, changing a valine to methionine (amino acidnumber 33 in the translated sequence) within the B-box1 motif of the zinc finger domain in the translatedsequence (Yanoet al. 2000; Martin et al. 2004; Armstead

et al. 2005). The functional importance of this poly-morphism is not known, but its rare occurrence wouldsuggest that it is either a very recent mutation or afunctionally important one, or both. It would be in-teresting to obtain genotypes that were homozygous forthe methionine allele and assess the impact on HD.

In view of the results of the association analysis, itseemed reasonable to assume that the 4443 polymor-phism could be a potentially useful polymorphism formarker-assisted selection for HD. Uniformity of flower-ing date is essential for variety registration, and fixingthe genes with major effect on flowering time directlythrough SNP allele selection will increase uniformity inthis out-crossing species. Flowering time is a relativelytime-consuming trait to measure accurately, as it involves

TABLE 10

Tree-scanning analysis within populations

Population Year Branch F-statistic Pvk PSim PMon

Ba9955 2004 10–14 7.545 0.066 0.0120 0.0330Ba9955 2005 10–14 3.792 0.028 0.0560 0.2570Ba9955 2004 2–3 5.139 0.082 0.0030 0.0880Ba9955 2005 2–3 9.616 0.152 0.0000 0.0100Ba9955 2004 3–8 5.139 0.082 0.0050 0.0880Ba9955 2005 3–8 9.616 0.152 0.0000 0.0100Ba10732 2004 2–3 3.943 0.058 0.0270 0.2390Ba10732 2005 2–3 17.657 0.258 0.0000 0.0000Ba10732 2004 3–8 3.943 0.058 0.0220 0.2390Ba10732 2005 3–8 17.657 0.258 0.0000 0.0000Ba10870 2004 2–7 6.505 0.055 0.0150 0.0350Ba10870 2005 2–7 6.205 0.052 0.0080 0.0370

Branches with significant association with HD after correc-tion for multiple testing and enforcement of monotonicity(PMon) in at least 1 year are shown. PSim is the permutationalprobability before correction for multiple testing. Pvk is theproportion of the trait variation explained by the partition.There were no significant associations in the second roundof tree scanning.

Association Mapping in Lolium 545

Page 12: Association of Candidate Genes With Flowering Time and Water … · 2007. 9. 20. · that LpHD1 is involved in the photoperiodic control of thefloweringphenotype.Wehavethereforeusedthisto

recording days after a particular date until ear emer-gence of the whole breeding population at two-dayintervals throughout the flowering period. The markermay be particularly useful in the turf-grass breedingprogram where one of the goals is to obtain elite vari-eties with an earlier flowering time, since this is likely toenhance seed yield as harvesting conditions will bemore favorable earlier in the year. However, it is crucialto verify the usefulness of markers identified in associ-ation analyses as illustrated by Andersen et al. (2005)and Camus-Kulandaivelu et al. (2006), who undertooka second association analysis of the Dwarf8 polymor-phisms and flowering time in order to validate the resultsof Thornsberryet al. (2001). We have therefore carriedout crosses between genotypes from the Ba10732population, which were homozygous for the ‘‘early’’ Callele, with either heterozygous or homozygous ‘‘late’’ Aallele turf-grass varieties. Sixty genotypes of the turf-grass cultivar AberElf were screened for the 4443 SNPand were all found to be homozygous A/A. As well asbeing extremely late heading, AberElf also has low seedset. The LpHD1 gene underlies a highly significant QTL,presumably influenced by unrelated but linked gene(s),that increases seed setting two- and fourfold in two un-related perennial ryegrass mapping families (L. Skøt,unpublished data); and the 4443 SNP is a potentiallyuseful marker for jointly bringing forward floweringdate and improving seed setting, a major component ofseed yield in out-crossing crops in a backcross breedingprogram. The segregating progeny will be evaluated forHD and seed setting.

In conclusion, a potential candidate SNP (4443) wasidentified in the LpHD1 locus, which consistently asso-ciated with HD. Its usefulness as a marker is currentlybeing assessed by crosses with turf-grass varieties. Whilehaplotype tree-based association mapping has beendescribed in the model species A. thaliana (Olsen et al.2004; Aranzana et al. 2005), this is, to our knowledge,the first use of the tree-scanning analysis in a cropspecies. It also identified the 4443 polymorphism as inthe single SNP analysis, but extended it to the 2118 SNP,despite the loss of power due to correction for a largernumber of multiple tests compared to single SNPanalyses (Templeton et al. 2005). As stated by Buntjer

et al. (2005), haplotype tree-based association analysismay have further potential to infer unobserved haplo-types in the sample and predicting their phenotype. Incombination with single SNP analyses it could assist indistinguishing between functional and nonfunctionalallelic variants merely linked to a functional variant.

We thank Zewei Luo, Michael Kearsey, and Kuruvilla Abraham at theSchool of Biosciences, The University of Birmingham, United King-dom for very useful discussions about this work. We are grateful to SueHeywood for technical assistance, Kirsten Skøt for sequencing andAFLP analysis, Sue Lister for the NIRS analysis, and Mark Hirst foradvice about the linear mixed model analysis. This work was funded byresponsive mode grant no. 203/D18078 and a competitive strategic

grant from the Biotechnology and Biological Sciences ResearchCouncil.

LITERATURE CITED

Andersen, J. R., T. Schrag, A. E. Melchinger, I. Zein andT. Lubberstedt, 2005 Validation of Dwarf8 polymorphismsassociated with flowering time in elite European inbred linesof maize (Zea mays L.). Theor. Appl. Genet. 111: 206–217.

Aranzana, M. J., S. Kim, K. Zhao, E. Bakker, M. Horton et al.2005 Genome-wide association mapping in Arabidopsis identi-fies previously known flowering time and pathogen resistancegenes. PLoS Genet. 1: e60.

Armstead, I. P., L. B. Turner, M. Farrell, L. Skøt, P. Gomez et al.,2004 Synteny between a major heading-date QTL in perennialryegrass (Lolium perenne L.) and the Hd3 heading-date locus inrice. Theor. Appl. Genet. 108: 822–828.

Armstead, I. P., L. Skøt, L. B. Turner, K. Skøt, I. S. Donnison et al.,2005 Identification of perennial ryegrass (Lolium perenne (L.))and meadow fescue (Festuca pratensis (Huds.)) candidate ortholo-goussequencesto thericeHd1(Se1)andbarleyHvCO1CONSTANS-like genes through comparative mapping and microsynteny. NewPhytol. 167: 239–247.

Balding, D. J., 2006 A tutorial on statistical methods for populationassociation studies. Nat. Rev. Genet. 7: 781–791.

Bao, J. S., H. Corke and M. Sun, 2006a Microsatellites, single nu-cleotide polymorphisms and a sequence tagged site in starch-synthesizing genes in relation to starch physicochemical propertiesin nonwaxy rice (Oryza sativa L.). Theor. Appl. Genet. 113: 1185–1196.

Bao, J. S., H. Corke and M. Sun, 2006b Nucleotide diversity instarch synthase IIa and validation of single nucleotide polymor-phisms in relation to starch gelatinization temperature and otherphysicochemical properties in rice (Oryza sativa L.). Theor. Appl.Genet. 113: 1171–1183.

Bert, P. F., G. Charmet, P. Sourdille, M. D. Hayward and F.Balfourier, 1999 A high-density molecular map for ryegrass(lolium perenne) using AFLP markers. Theor. Appl. Genet. 99:445–452.

Biswas, S., and J. M. Akey, 2006 Genomic insights into positive se-lection. Trends Genet. 22: 437–446.

Breseghello, F., and M. E. Sorrells, 2006a Association analysis asa strategy for improvement of quantitative traits in plants. CropSci. 46: 1323–1330.

Breseghello, F., and M. E. Sorrells, 2006b Association mappingof kernel size and milling quality in wheat (Triticum aestivum L.)cultivars. Genetics 172: 1165–1177.

Buntjer, J. B., A. P. Sorensen and J. D. Peleman, 2005 Haplotypediversity: the link between statistical and biological association.Trends Plant Sci. 10: 466–471.

Camus-Kulandaivelu, L., J. B. Veyrieras, D. Madur, V. Combes,M. Fourmann et al., 2006 Maize adaptation to temperate cli-mate: relationship between population structure and polymor-phism in the Dwarf8 gene. Genetics 172: 2449–2463.

Chalmers, J., A. Lidgett, N. Cummings, Y. Cao, J. Forster et al.,2005 Molecular genetics of fructan metabolism in perennialryegrass. Plant Biotechnol. J. 3: 459–474.

Cornish, M. A., M. D. Hayward and M. J. Lawrence, 1979 Self in-compatibility in ryegrass. 1. Genetic control in diploid Lolium per-enne L. Heredity 43: 95–106.

Cresswell, A., N. R. Sackville Hamilton, A. K. Roy and B. M. F.Viegas, 2001 Use of amplified fragment length polymor-phism markers to assess genetic diversity of Lolium species fromPortugal. Mol. Ecol. 10: 229–241.

Farrar, K., T. Asp, T. Lubberstedt, M. Xu, A. Thomas et al.,2007 Construction of two Lolium perenne BAC libraries and iden-tification of BACs containing candidate genes for disease resis-tance and forage quality. Mol. Breed. 19: 15–23.

Felsenstein, J., 1993 PHYLIP (Phylogenetic Inference Package)version 3.5c. Department of Genetics, University of Washington,Seattle.

Flint-Garcia, S. A., J. M. Thornsberry and E. S. Buckler,2003 Structure of linkage disequilibrium in plants. Annu.Rev. Plant Biol. 54: 357–374.

546 L. Skøt et al.

Page 13: Association of Candidate Genes With Flowering Time and Water … · 2007. 9. 20. · that LpHD1 is involved in the photoperiodic control of thefloweringphenotype.Wehavethereforeusedthisto

Flint-Garcia, S. A., A. C. Thuillet, J. Yu, G. Pressoir, S. M. Romero

et al., 2005 Maize association population: a high-resolution plat-form for quantitative trait locus dissection. Plant Journal 44:1054–1064.

Gallagher, J., and C. Pollock, 1998 Isolation and characterizationof a cDNA clone from Lolium temulentum L. encoding for a su-crose hydrolytic enzyme which shows alkaline/neutral invertaseactivity. J. Exp. Bot. 49: 789–795.

Gallagher, J. A., A. J. Cairns and C. J. Pollock, 2004 Cloning andcharacterization of a putative fructosyltransferase and two puta-tive invertase genes from the temperate grass Lolium temulentumL. J. Exp. Bot. 55: 557–569.

Gallagher, J. A., A. J. Cairns and L. B. Turner, 2007 Fructan intemperate forage grasses, agronomy, physiology and molecularbiology, pp. 15–46 in Recent Advances in FructooligosaccharidesResearch, edited by N. Benkeblia. Research Signpost, Kerala,India.

Gaut, B. S., and A. D. Long, 2003 The lowdown on linkage disequi-librium. Plant Cell 15: 1502–1506.

Gupta, P. K., S. Rustgi and P. L. Kulwal, 2005 Linkage disequilib-rium and association studies in higher plants: Present status andfuture prospects. Plant Mol. Biol. 57: 461–485.

Humphreys, M. W., R. S. Yadav, A. J. Cairns, L. B. Turner, J.Humphreys et al., 2006 A changing climate for grassland re-search. New Phytol. 169: 9–26.

Ivandic, V., C. A. Hackett, E. Nevo, R. Keith, W. T. B. Thomas et al.,2002 Analysis of simple sequence repeats (SSRs) in wild barleyfrom the Fertile Crescent: associations with ecology, geographyand flowering time. Plant Mol. Biol. 48: 511–527.

Jones, E. S., N. L. Mahoney, M. D. Hayward, I. P. Armstead, J. G.Jones et al., 2002 An enhanced molecular marker based geneticmap of perennial ryegrass (Lolium perenne) reveals comparativerelationships with other Poaceae genomes. Genome 45: 282–295.

Kraakman, A., F. Martinez, B. Mussiraliev, F. van Eeuwijk and R.Niks, 2006 Linkage disequilibrium mapping of morphological,resistance, and other agronomically relevant traits in modernspring barley cultivars. Mol. Breed. 17: 41–58.

Lister, S. J., and M. S. Dhanoa, 1998 Comparison of calibrationmodels for the prediction of forage quality traits using near in-frared spectroscopy. J. Agric. Sci. 131: 241–242.

Lynch, M., and B. G. Milligan, 1994 Analysis of population ge-netic structure with RAPD markers. Mol. Ecol. 3: 91–99.

Martin, J., M. Storgaard, C. H. Andersen and K. K. Nielsen,2004 Photoperiodic regulation of flowering in perennial rye-grass involving a CONSTANS-like homolog. Plant Mol. Biol. 56:159–169.

Nordborg, M., and S. Tavare, 2002 Linkage disequilibrium: whathistory has to tell us. Trends Genet. 18: 83–90.

Nordborg, M., J. O. Borevitz, J. Bergelson, C. C. Berry, J. Chory

et al., 2002 The extent of linkage disequilibrium in Arabidopsisthaliana. Nat. Genet. 30: 190–193.

Olsen, K. M., S. S. Halldorsdottir, J. R. Stinchcombe, C. Weinig,J. Schmitt et al., 2004 Linkage disequilibrium mapping of Ara-bidopsis CRY2 flowering time alleles. Genetics 167: 1361–1369.

Page, R. D. M., 1996 An application to display phylogenetic trees onpersonal computers. Comp. Appl. Biosci. 12: 357–358.

Pritchard, J. K., M. Stephens and P. Donnelly, 2000 Inference ofpopulation structure using multilocus genotype data. Genetics155: 945–959.

Putterill, J., F. Robson, K. Lee, R. Simon and G. Coupland,1995 The CONSTANS gene of arabidopsis promotes floweringand encodes a protein showing similarities to zinc finger tran-scription factors. Cell 80: 847–857.

Rafalski, A., and M. Morgante, 2004 Corn and humans: recombi-nation and linkage disequilibrium in two genomes of similar size.Trends Genet. 20: 103–111.

Roldan-Ruiz, I., J. Dendauw, E. V. Bockstaele, A. Depicker and M.D. Loose, 2000 AFLP markers reveal high polymorphic rates inryegrasses (Lolium spp.). Mol. Breed. 6: 125–134.

Rozas, J., J. C. Sanchez-DelBarrio, X. Messeguer and R. Rozas,2003 DnaSP, DNA polymorphism analyses by the coalescentand other methods. Bioinformatics 19: 2496–2497.

Skøt, L., M. O. Humphreys, I. Armstead, S. Heywood, K. P. Skøt

et al., 2005 An association mapping approach to identify flower-ing time genes in natural populations of Lolium perenne (L.). Mol.Breed. 15: 233–245.

Stephens, M., and P. Donnelly, 2003 A comparison of bayesianmethods for haplotype reconstruction from population genotypedata. Am. J. Hum. Genet. 73: 1162–1169.

Stephens, M., N. J. Smith and P. Donnelly, 2001 A new statisticalmethod for haplotype reconstruction from population data. Am.J. Hum. Genet. 68: 978–989.

Szalma, S. J., E. S. Buckler, M. E. Snook and M. D. McMullen,2005 Association analysis of candidate genes for maysin andchlorogenic acid accumulation in maize silks. Theor. Appl.Genet. 110: 1324–1333.

Templeton, A. R., T. Maxwell, D. Posada, J. H. Stengard, E.Boerwinkle et al., 2005 Tree scanning: A method for usinghaplotype trees in phenotype/genotype association studies.Genetics 169: 441–453.

Thornsberry, J. M., M. M. Goodman, J. Doebley, S. Kresovich, D.Nielsen et al., 2001 Dwarf8 polymorphisms associate with vari-ation in flowering time. Nat. Genet. 28: 286–289.

Turner, L. B., A. J. Cairns, I. P. Armstead, J. Ashton, K. Skøt et al.,2006 Dissecting the regulation of fructan metabolism in peren-nial ryegrass (Lolium perenne) with quantitative trait locus map-ping. New Phytol. 169: 45–58.

Vekemans, X., 2002 AFLP-SURV version 1.0. Laboratoire de Genet-ique et Ecologie Vegetale, Universite Libre de Bruxelles, Brus-sels, Belgium.

Wilkins, P. W., and M. O. Humphreys, 2003 Progress in breedingperennial forage grasses for temperate agriculture. J. Agric. Sci.140: 129–150.

Wilson, L. M., S. R. Whitt, A. M. Ibanez, T. R. Rocheford, M. M.Goodman et al., 2004 Dissection of maize kernel compositionand starch production by candidate gene association. Plant Cell16: 2719–2733.

Yano, M., Y. Katayose, M. Ashikari, U. Yamanouchi, L. Monna et al.,2000 Hd1, a major photoperiod sensitivity quantitative trait lo-cus in rice, is closely related to the Arabidopsis flowering timegene CONSTANS. Plant Cell 12: 2473–2484.

Yu, J., and E. S. Buckler, 2006 Genetic association mapping and ge-nome organization of maize. Curr. Opin. Biotechnol. 17: 155–160.

Yu, J.,G. Pressoir,W.H.Briggs, I. VrohBi, M.Yamasakiet al., 2006 Aunified mixed-model method for association mapping that ac-counts for multiple levels of relatedness. Nat. Genet. 38: 203–208.

Communicating editor: R. W. Doerge

Association Mapping in Lolium 547