association mapping of biomass yield and stem composition ... · association mapping of biomass...

12
THE PLANT GENOME MARCH 2011 VOL . 4, NO. 1 ORIGINAL RESEARCH Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population Xuehui Li 1 , Yanling Wei 1 , Kenneth J. Moore, Réal Michaud, Donald R. Viands, Julie L. Hansen, Ananta Acharya, and E. Charles Brummer* Abstract Alfalfa (Medicago sativa L.), an important forage crop that is also a potential biofuel crop, has advantages of high yield, high lignocellulose concentration in stems, and has low input costs. In this study, we investigated population structure and linkage disequilibrium (LD) patterns in a tetraploid alfalfa breeding population using genome-wide simple sequence repeat (SSR) markers and identified markers related to yield and cell wall composition by association mapping. No obvious population structure was found in our alfalfa breeding population, which could be due to the relatively narrow genetic base of the founders and/or due to two generations of random mating. We found significant LD (p < 0.001) between 61.5% of SSR marker pairs separated by less than 1 Mbp. The observed large extent of LD could be explained by the effect of bottlenecking and selection or the high mutation rates of SSR markers. Total marker heterozygosity was positively related to biomass yield in each of five environments, but no relationship was noted for stem composition traits. Of a total of 312 nonrare (frequency >10%) alleles across the 71 SSR markers, 15 showed strong association (p < 0.005) with yield in at least one of five environments, and most of the 15 alleles were identified in multiple environments. Only one allele showed strong association with acid detergent fiber (ADF) and one allele with acid detergent lignin (ADL). Alleles associated with traits could be directly applied in a breeding program using marker-assisted selection. However, based on our estimated LD level, we would need about 1000 markers to explore the whole alfalfa genome for association between markers and traits. A LFALFA (Medicago sativa L.) is one of the most impor- tant forage crops in the world due to its high yield and high nutritive value. In 2008, alfalfa was the third widely planted crop aſter maize (Zea mays L.) and soy- bean [ Glycine max (L.) Merr.] in the United States, with about 8.5 million ha for hay production (USDA; available at http://www.nass.usda.gov/Statistics_by_Subject/index. php?sector=CROPS [verified 6 Jan. 2011]), with dry hay production of about 63.7 million t and a total value of US$10.8 billion. Alfalfa is a potential biofuel crop, with stems used as energy feedstock and leaves as protein meal for livestock feed (Jung et al., 1994b; Samac et al., 2006). e advantages of alfalfa used as a biofuel crop compared to other crops include its perennial nature, which decreases soil erosion, its nitrogen fixation ability, which reduces input cost, its high yield, and its high lig- nocellulose concentration in stems (Dien et al., 2006). Yield is critical for all uses of alfalfa. Unfortunately, improvement in yield of alfalfa has stagnated in recent years (Lamb et al., 2006; Riday and Brummer, 2002). Alfalfa yield has increased about 0.5% per year since 1919, much lower than maize (5.3% per year) (USDA; available at http://www.nass.usda.gov/Statistics_by_ Published in The Plant Genome 4. Published 27 Jan. 2011. doi: 10.3835/plantgenome2010.09.0022 © Crop Science Society of America 5585 Guilford Rd., Madison, WI 53711 USA An open-access publication All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher. X. Li, Y. Wei, and E.C. Brummer, Forage Improvement Division, The Samuel Roberts Noble Foundation, 2510 Sam Noble Pkwy., Ardmore, OK 73401; K.J. Moore, Dep. of Agronomy, Iowa State Univ., Ames, IA 50011; R. Michaud, Agriculture and Agri-Food Canada, Ste. Foy, QC, Canada G1V 2J3; D.R. Viands and J. L. Hansen, Dep. of Plant Breeding and Genetics, Cornell Univ., Ithaca, NY 14853. A. Acharya, Institute of Plant Breeding, Genetics, and Genomics, Univ. of Georgia, Athens, GA 30602. 1 Xuehui Li and Yanling Wei contributed equally to this work. Received 16 Sept. 2010. *Corresponding author ([email protected]). Abbreviations: ADF, acid detergent fiber; ADL, acid detergent lignin; BAC, bacterial artificial chromosome; BIC, Bayesian information criterion; LD, linkage disequilibrium; MAS, marker-assisted selection; NDF, neutral detergent fiber; NIRS, near-infrared reflectance spectroscopy; PCR, polymerase chain reaction; QTL, quantitative trait loci; SNP, single nucleotide polymorphism; SSR, simple sequence repeat.

Upload: others

Post on 24-May-2020

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Association Mapping of Biomass Yield and Stem Composition ... · Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population ... hypothesis

the plant genome march 2011 vol. 4, no. 1

original research

Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population

Xuehui Li1, Yanling Wei1, Kenneth J. Moore, Réal Michaud, Donald R. Viands, Julie L. Hansen, Ananta Acharya, and E. Charles Brummer*

AbstractAlfalfa (Medicago sativa L.), an important forage crop that is also a potential biofuel crop, has advantages of high yield, high lignocellulose concentration in stems, and has low input costs. In this study, we investigated population structure and linkage disequilibrium (LD) patterns in a tetraploid alfalfa breeding population using genome-wide simple sequence repeat (SSR) markers and identified markers related to yield and cell wall composition by association mapping. No obvious population structure was found in our alfalfa breeding population, which could be due to the relatively narrow genetic base of the founders and/or due to two generations of random mating. We found significant LD (p < 0.001) between 61.5% of SSR marker pairs separated by less than 1 Mbp. The observed large extent of LD could be explained by the effect of bottlenecking and selection or the high mutation rates of SSR markers. Total marker heterozygosity was positively related to biomass yield in each of five environments, but no relationship was noted for stem composition traits. Of a total of 312 nonrare (frequency >10%) alleles across the 71 SSR markers, 15 showed strong association (p < 0.005) with yield in at least one of five environments, and most of the 15 alleles were identified in multiple environments. Only one allele showed strong association with acid detergent fiber (ADF) and one allele with acid detergent lignin (ADL). Alleles associated with traits could be directly applied in a breeding program using marker-assisted selection. However, based on our estimated LD level, we would need about 1000 markers to explore the whole alfalfa genome for association between markers and traits.

Alfalfa (Medicago sativa L.) is one of the most impor-tant forage crops in the world due to its high yield

and high nutritive value. In 2008, alfalfa was the third widely planted crop after maize (Zea mays L.) and soy-bean [Glycine max (L.) Merr.] in the United States, with about 8.5 million ha for hay production (USDA; available at http://www.nass.usda.gov/Statistics_by_Subject/index.php?sector=CROPS [verified 6 Jan. 2011]), with dry hay production of about 63.7 million t and a total value of US$10.8 billion. Alfalfa is a potential biofuel crop, with stems used as energy feedstock and leaves as protein meal for livestock feed (Jung et al., 1994b; Samac et al., 2006). The advantages of alfalfa used as a biofuel crop compared to other crops include its perennial nature, which decreases soil erosion, its nitrogen fixation ability, which reduces input cost, its high yield, and its high lig-nocellulose concentration in stems (Dien et al., 2006).

Yield is critical for all uses of alfalfa. Unfortunately, improvement in yield of alfalfa has stagnated in recent years (Lamb et al., 2006; Riday and Brummer, 2002). Alfalfa yield has increased about 0.5% per year since 1919, much lower than maize (5.3% per year) (USDA; available at http://www.nass.usda.gov/Statistics_by_

Published in The Plant Genome 4. Published 27 Jan. 2011. doi: 10.3835/plantgenome2010.09.0022 © Crop Science Society of America 5585 Guilford Rd., Madison, WI 53711 USA An open-access publication

All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.

X. Li, Y. Wei, and E.C. Brummer, Forage Improvement Division, The Samuel Roberts Noble Foundation, 2510 Sam Noble Pkwy., Ardmore, OK 73401; K.J. Moore, Dep. of Agronomy, Iowa State Univ., Ames, IA 50011; R. Michaud, Agriculture and Agri-Food Canada, Ste. Foy, QC, Canada G1V 2J3; D.R. Viands and J. L. Hansen, Dep. of Plant Breeding and Genetics, Cornell Univ., Ithaca, NY 14853. A. Acharya, Institute of Plant Breeding, Genetics, and Genomics, Univ. of Georgia, Athens, GA 30602. 1Xuehui Li and Yanling Wei contributed equally to this work. Received 16 Sept. 2010. *Corresponding author ([email protected]).

Abbreviations: ADF, acid detergent fiber; ADL, acid detergent lignin; BAC, bacterial artificial chromosome; BIC, Bayesian information criterion; LD, linkage disequilibrium; MAS, marker-assisted selection; NDF, neutral detergent fiber; NIRS, near-infrared reflectance spectroscopy; PCR, polymerase chain reaction; QTL, quantitative trait loci; SNP, single nucleotide polymorphism; SSR, simple sequence repeat.

Page 2: Association Mapping of Biomass Yield and Stem Composition ... · Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population ... hypothesis

the plant genome march 2011 vol. 4, no. 1

Subject/index.php?sector=CROPS [verified 6 Jan. 2011]). Marker-assisted selection (MAS) could enhance the efficiency of selection for yield, enhancing genetic gain. The prerequisite for MAS is identifying markers strongly linked to quantitative trait loci (QTL) controlling the trait. Although genetic maps have been constructed and QTL mapped in alfalfa (Brummer, 2004; Robins et al., 2007b), the mapping populations were all derived from crosses of two individual genotypes. A single biparental cross does not represent the diversity of QTL within an entire breeding population (Flint-Garcia et al., 2003), and these populations typically have limited recombination, limiting the resolution of QTL mapping.

Association mapping, in contrast, looks for relation-ships between markers and traits in unstructured breeding populations or wild germplasm collections (Breseghello and Sorrells, 2006; Ducrocq et al., 2008; Kraakman et al., 2004; Stich et al., 2008; Yu et al., 2006). Association map-ping relies on linkage disequilibrium (LD) resulting from physical linkage between markers and trait loci, similar to family-based mapping approaches. However, because breeding or natural populations encompass greater genetic diversity and represent historic recombination over many generations, more QTL can be identified and then local-ized to more precise intervals compared to typical bipa-rental populations. However, unlike a population such as an F2 or recombinant inbred line population, the observed LD in a natural or breeding population could be caused by physical linkage as well as nonlinkage factors, such as genetic drift, selection, and population admixture (Flint-Garcia et al., 2003; Jannink and Walsh, 2002; Mackay and Powell, 2007). Thus, LD could be observed between loci that are not linked, resulting in spurious marker–trait associations. The number of false positive associations can be decreased by assessing and accounting for the popula-tion substructure and kinship relatedness among indi-viduals in the population when evaluating marker–trait relationships (Stich et al., 2008; Yu et al., 2006). Population structure and kinship relatedness in the founder popula-tions could be diminished in advanced generation popula-tions as a result of random mating (Mackay and Powell, 2007), a situation which could easily apply to outcrossing plant species such as alfalfa.

Cell wall composition can be bred concurrently with yield to optimize biomass for forage or for biofuel (Hamelinck et al., 2005; McKendry, 2002). In alfalfa, selection for altered cell wall constituents has been con-ducted successfully for cellulose and/or lignin (Coors et al., 1986; Kephart et al., 1990), digestibility (Jung et al., 1994a), and digestion rate (Goplen et al., 1993). Previ-ous breeding programs aimed to improve digestibility by decreasing lignin concentration and/or altered lignin composition. Depending on the biofuel platform used, other modifications may be necessary for alfalfa. Because alfalfa germplasm varies considerably for cell wall con-centration (Jung et al., 1997), altering composition in various ways for different end uses is possible.

In this experiment, our objective was to test the hypothesis that LD in a tetraploid alfalfa breeding population would extend sufficient distances to enable marker–trait associations to be detected with relatively few markers. Given sufficient LD, we hypothesized we could detect QTL for biomass yield and cell wall composition as measured using neutral detergent fiber (NDF), acid deter-gent fiber (ADF), and acid detergent lignin (ADL).

Materials and MethodsMapping PopulationThe tetraploid population we used in this study was formed by a strain cross between three commercial cultivars, 5454 developed by Pioneer Hibred, Inc. (John-ston, IA), in the midwestern United States, Oneida VR developed by Cornell University (Ithaca, NY) (Viands et al., 1990), and AC Viva developed by Ag and Agri-Food Canada, Ste. Foy, QC. The cultivars had fall dormancy ratings (Teuber et al., 1998) of 4 for 5454 and 3 for Oneida VR and AC Viva. The pedigrees of the cultivars are unknown, but they could be assumed to be from the same general semidormant germplasm group. The initial strain cross included about 100 individuals from each cultivar. The resulting population was then ran-dom mated in the greenhouse for two generations using approximately 100 individuals in each generation, after which 190 individuals were randomly selected, geno-typed, and clonally propagated for field evaluation.

Phenotypic EvaluationExperimental DesignField experiments were planted at the Agronomy and Agricultural Engineering Research Farm west of Ames, IA, in April 2004; at the Snyder 5 east field adjacent to the Game Farm Road Weather Station in Ithaca, NY, in June 2004; and at the Harlaka experimental site in Lévis, QC, in May 2005. The experimental design at each loca-tion was a randomized completed block design with three replications, with each plot in each replication containing three clones of each genotype. Clones were spaced 15 cm apart within plots; plots were 75 cm apart within and between rows.

YieldNo yield data were collected in the year of establishment. Yield data were collected in 2005 at Iowa and New York and at all locations in 2006 by harvesting and weighing the fresh mass from the entire plot. Three harvests were taken each year at New York in 2005 (NY05) and 2006 (NY06), four harvests at Iowa in 2005 (IA05), and two harvests in 2006 at Iowa (IA06) and Québec (Que06). For each har-vest in a given year and location, samples were randomly taken during the harvest period, weighted wet, dried for 5 d at 60°C in a forced-air dryer, and then weighed dry. The average dry matter content of the samples was used to compute dry matter biomass yield of each plot. The yearly yield was estimated as the sum of all harvests at each

Page 3: Association Mapping of Biomass Yield and Stem Composition ... · Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population ... hypothesis

li et al.: association mapping in tetraploid alfalfa

location in a given year. We considered each year by loca-tion combination as a separate environment.

Cell Wall CompositionSamples from the first (June) harvest in 2006 at Iowa were used for analyses of stem cell wall composition. We sampled a single time point from one location for cell wall composi-tion analysis because genotype × environment and geno-type × harvest interactions have been rarely observed for alfalfa composition traits (Sheaffer et al., 1998, 2000). After whole plant samples were dried at 60°C for 4 d, stems were separated from leaves and then ground to a fine powder. Near-infrared reflectance spectroscopy (NIRS) (Windham et al., 1989) was used to analyze composition. Duplicate reflectance (log 1/R) measurements between 400 and 2500 nm at 2-nm intervals were taken using a scanning mono-chromator (Model 6500; NIRSystems, Silverspring, MD). A subset of 50 samples was selected for calibration on the basis of spectral characteristics. An ANKOM200 Fiber Analyzer (ANKOM Technology Corp., Macedon, NY) was used to determine NDF and ADF for the calibration set (Vogel et al., 1999). Ash and ADL were determined for the calibra-tion set as described in Van Soest et al. (1991) by placing the ANKOM bags containing the residual of the ADF proce-dure in a 3 L DaisyII incubator jar (ANKOM Technology Corp.) and covering them with 72% H2SO4. The samples were rotated in the incubator for 3 h, subsequently washed in hot water for 15 min followed by acetone for 10 min, dried in a 100°C oven overnight, and weighed after cooling to room temperature. Finally the entire sample bag with its remaining material was ashed at 525°C for 4 h, and the ash was weighed. Ash weights were calculated after account-ing for the sample bag material. Acid detergent lignin was adjusted for ash. The standard error of calibration, R2, and standard error of cross validation for the trait values pre-dicted by NIRS are as follows: NDF = 0.34, 0.98, and 0.86; ADF = 0.22, 0.99, and 0.73; ADL = 0.15, 0.97, and 0.24.

GenotypingWe used simple sequence repeat (SSR) markers for geno-typing using previously described markers (Julier et al., 2003; Robins et al., 2007b; Sledge et al., 2005) and addi-tional markers designed from ESTs by ourselves and the Samuel Roberts Noble Foundation (M.J. Monteros, personal communication, 2009). Primers for all markers (Supplemental Table S1) were synthesized by Integrated DNA Technologies (Coralville, IA; http://www.idtdna.com/Home/Home.aspx [verified 6 Jan. 2011]) with the addition of 18 nucleotides of the M13 universal primer sequence (Schuelke, 2000) onto the 5′ end of the forward primer. The M13 universal primer sequence was syn-thesized and labeled with a blue (6-FAM), green (HEX), or yellow (NED) fluorescent tag by Applied Biosystems (Foster City, CA; http://www.appliedbiosystems.com [verified 30 Dec. 2010]). Polymerase chain reaction (PCR) recipes and ingredients were exactly the same as Sledge et al. (2005) and PCR programming was either the same as Julier et al. (2003) (4 min at 94°C, followed by 35 cycles

with 30 sec at 94°C, 1 min at 55°C, and 1 min at 72°C, plus a final elongation step of 7 min at 72°C [program A in Supplemental Table S1]) or as follows: 95°C for 2 min, followed by 30 cycles with 30 sec at 95°C, 45 sec at 60°C, and 45 sec at 72°C, plus 10 cycles with 30 sec at 95°C, 45 sec at 53°C, 45 sec at 72°C, and finalized with an elonga-tion step of 7 min at 72°C (program B in Supplemental Table S1). Each SSR marker was amplified by PCR inde-pendently, and we pooled PCR products of 4 to 6 reactions for genotyping on an automated ABI3730 sequencer at the University of Georgia (Athens, GA) DNA Sequencing Facility. The data files from the analyzer were analyzed by Genemarker software (SoftGenetics, State College, PA; www.softgenetics.com [verified 30 Dec. 2010]). Each allele of each marker was scored as “1” for presence and “0” for absence on each individual in the population.

Data AnalysisPhenotypeTotal yearly biomass yield was measured in five environ-ments: IA05, IA06, NY05, NY06, and Que06. The yearly yield for all environments was fit to a generalized linear model:

yijk = µ + lj + rk(j) + gi + glij + eijk, [1]

in which yijk is the yield for the ith genotype in the kth replication of the jth environment, µ is the grand mean, lj is the effect of the jth environment, rk(j) is the effect of kth replication at the jth environment, gi is the genetic effect of the ith genotype, glij is the interaction effect of ith genotype and jth environment, and eijk is the residual. All factors were considered as random, and the ANOVA was performed using PROC GLM (SAS Institute, 2004). The variance components of residual (σ2), genotype × envi-ronment interaction (σ2

gl), and genotype (σ2g) were esti-

mated and used to compute broad-sense heritability as:

h2 = σ2g/(σ

2/jk + σ2gl/j + σ2

g).

Because significant genotype × environment interaction was observed in model [1], the yearly yield at each environ-ment was fit separately to a simple generalized linear model:

yik = µ + rk + gi + eik, [2]

in which yik is the yield for the ith genotype in the kth replication, µ is the grand mean of that environment, rk is the effect of the kth replication, gi is the genetic effect of the ith genotype, and eik is the residual. The genotype effect gi was considered as fixed. The least square mean of yearly yield for each of 190 genotypes, Mi, was estimated in each environment using PROC GLM (SAS Institute, 2004) and was used for association analysis.

Because the cell wall composition was measured only in one environment, the data were fit to the simple model [2]. The least square mean of each genotype was estimated using PROC GLM (SAS Institute, 2004) and used for association analysis. To estimate heritability, the genotype effect gi was considered as random, and the variance com-ponents of residual (σ2) and genotype (σ2

g) were estimated. The broad sense heritability was calculated as:

Page 4: Association Mapping of Biomass Yield and Stem Composition ... · Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population ... hypothesis

the plant genome march 2011 vol. 4, no. 1

h2 = σ2g/(σ

2/k + σ2g).

Marker PolymorphismThe number of alleles produced by each of the 71 SSR mark-ers was counted, and frequency of each allele was computed across the entire population. Alleles with a frequency less than 10% (19 of 190 individuals) were considered to be rare alleles and were excluded from further analysis.

Linkage Disequilibrium AnalysisBecause of complex genetic segregation in autotetraploid alfalfa, the genotypic LD between pairs of SSR markers was estimated using Fisher’s exact test. The test was based on the observed pairwise genotype frequency. We could not determine allele copy number, so the genotypic classes were actually phenotypes. For example, the marker phenotype A1A2 could include the following genotypes: A1A1A1A2, A1A1A2A2, and A1A2A2A2. Genotypic LD was evaluated between pairs of SSR markers. To estimate physical loca-tions of SSR markers, primer sequences were used to search Medicago truncatula Gaertn. pseudomolecules produced from the M. truncatula euchromatic genome sequence, build 3.0 with BLAST at the website (http://medicago.org/genome/cvit_blast.php [verified 30 Dec. 2010]). If both for-ward and reverse primers hit the same bacterial artificial chromosome (BAC) location with a predicted amplicon size similar to our observed fragment size, we used the position of the primer with the smaller number on the alignment with the BAC sequence as the physical location of the cor-responding marker (Supplemental Table S1).

Population Structure and KinshipBased on the LD between pairs of SSR markers, we picked 56 independent markers for population structure and kinship estimation. Population structure was calculated by STRUCTURE 2.2 (Falush et al., 2007; Pritchard et al., 2000). The number of subpopulations (z) was set from 1 to 20, and each subpopulation number (z) was evaluated for 20 repetitions using the admixture model and correlated allele frequencies. The iteration number for the Markov Chain Monte Carlo algorithm was set as 100,000, follow-ing a burn-in period of 100,000 iterations. Twenty more repetitions were performed for the z from 1 to 10 with the same parameters setting above. The log likelihood value and an ad hoc quantity (ΔK) based on the second order change of log likelihood (Evanno et al., 2005) were cal-culated to estimate the number of subpopulations. After determining the number of subpopulations, the first z – 1 columns of the population structure Q matrix was used as the population structure (S) matrix and taken account into the linear mixed models for association analysis between markers and traits. The kinship coefficient matrix (A) was calculated by SPAGeDi (Hardy and Vekemans, 2002). Kin-ship coefficients lower than zero were set to zero.

Association MappingBecause interactions between genotype and environ-ment were observed for yield, we estimated least square

means for yield in each environment using linear mixed models (Table 1) to evaluate associations. Model E1, the simplest model to explain yield variation with marker alleles, does not include either population structure (Q) or kinship (K) among the genotypes. Model E2 includes the kinship matrix, Model E3 includes the population structure matrix, and Model E4 includes both population structure and kinship matrices. For all models, the allele substitution effect and population structure were consid-ered to be fixed effects; the genetic effect was considered random. The variance for the random effects for models E2 and E4 were assumed to be Var(ğ) = 2Aσ2

ğ and Var(ĕ) = Iσ2

ĕ, in which A is a 190 × 190 matrix of estimated kin-ship coefficients that defined the degree of genetic cova-riance between pairs of genotypes and I is a 190 × 190 matrix in which the off-diagonal elements were zero and the diagonal elements were one. The variances for the random effects for models E1 and E3 were assumed to be Var(ğ) = Iσ2

ğ and Var(ĕ) = Iσ2ĕ, respectively. Cell wall

composition (NDF, ADF, and ADL) was measured at one environment, so the model for association is same as that for yield at each environment. Alleles associated with each trait were identified at p < 0.005.

We programmed a macro in SAS to perform associa-tion analysis between alleles from the 71 SSR makers and each trait for all four mixed model equations with PROC MIXED (SAS Institute, 2004). The autotetraploid, nonin-bred alfalfa genotypes evaluated in this experiment could have more than two alleles at one SSR marker in a given individual. The dosage of any specific allele in the popula-tion could range from 0 to 4 for any one individual and could not be clearly determined. Thus, proper statistical analysis is difficult to perform and precisely identifying the alleles associated with target traits is problematic. In this study, we fitted trait data to a mixed model for each marker, which included all alleles for that marker (Paje-rowska-Mukhtar et al., 2009; Stich and Melchinger, 2009). This enabled us to test the effect of each allele of a given marker while removing the effects of other alleles from the same marker, providing a better association analysis compared to considering each allele as a separate locus. We evaluated the Bayesian information criterion (BIC) to compare the goodness of fit of the four different models.

ResultsPhenotypic EvaluationsA significant genotype × environment interaction was observed for total yearly yield (Table 2). As a conse-quence, we considered the five location–year combi-nations as separate environments in the subsequent analyses. The 190 genotypes differed for all traits (p < 0.001; Table 2). The broad-sense heritabilities were high for all traits: 86% for yield, 77% for NDF, 75% for ADF, and 74% for ADL (Table 2). All traits were normally dis-tributed (Fig. 1). The range of least square means among 190 genotypes was great for all traits, spanning more than an order of magnitude for yield.

Page 5: Association Mapping of Biomass Yield and Stem Composition ... · Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population ... hypothesis

li et al.: association mapping in tetraploid alfalfa

Heterozygosity and its Correlation with Phenotypes

The average allele number per genotype across the 71 SSR markers (heterozygosity) varied from 1.68 to 2.48, with an average of 2.10. The correlation between

heterozygosity and yield was positive for all five environ-ments, ranging from r @ 0.14 to 0.20. However, no cor-relation was observed between heterozygosity and NDF, ADF, and ADL (Fig. 2).

Simple Sequence Repeat Genotyping and Marker PolymorphismIn total 470 alleles were scored across 71 SSR markers. Allele number per marker varied from 2 to 16, with 6.62

Table 1. Descriptions of the four models used to assess marker-trait associations, with information about whether they controlled for kinship among individuals in the population or for substructure within the overall population.

Model code Model† KinshipPopulation structure

E1 Mi = µ + aXi + ği + ĕi No NoE2 Mi = µ + aXi + ği + ĕi Yes NoE3 Mi = µ + aXi + S iu uu

zv

=

−∑ 1

1+ ği + ĕi No Yes

E4 Mi = µ + aXi + S iu uu

zv

=

−∑ 1

1+ ği + ĕi Yes Yes

†Mi is the estimated trait value of ith genotype, µ is an intercept term, a are the effects of allele substitution; Xi is a matrix with presence (1) or absence (0) of the corresponding alleles in ith genotype; vu is the effect of the uth column of the population structure matrix S; the matrix S includes the first z – 1 columns of population structure Q matrix; ği is the residual genetic effect of the ith genotype that result from not further identified background quantitative trait loci (QTL); and ĕi is the residual. The allele substitution effect and population structure were fixed, the genetic effect was random.

Table 2. The mean squares and heritabilities for biomass yield and cell wall composition in an alfalfa breeding population.

Components Yield NDF† ADF ADL––––––––––––––-Mean squares––––––––––––––

Genotype 7293987*** 11.41*** 5.42*** 0.90***Genotype × environment 884823*** – – –Error 309495 2.74 1.42 0.24

––––––––––––––––––h2‡––––––––––––––––––Heritability 0.86 0.77 0.75 0.74***F -test for mean square was significant at p < 0.001.†NDF, neutral detergent fiber; ADF, acid detergent fiber; ADL, acid detergent lignin.‡h2, heritability.

Figure 1. Histograms of trait values for an alfalfa breeding population evaluated for biomass yield in five environments and for stem cell wall composition. IA, Iowa; NY, New York; Que, Québec; 05, 2005; 06, 2006; ADL, acid detergent lignin; ADF, acid detergent fiber; NDF, neutral detergent fiber.

Page 6: Association Mapping of Biomass Yield and Stem Composition ... · Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population ... hypothesis

the plant genome march 2011 vol. 4, no. 1

alleles per marker locus on average. Rare alleles present in fewer than 10% of individuals represented 33.6% of all alleles. Considering only the remaining 312 alleles, indi-vidual SSR markers produced between 2 and 8 alleles per marker, with an average of 4.39.

Linkage DisequilibriumThe frequency of marker pairs for which LD was observed using Fisher’s exact test decreased as the physical distance

increased, as expected (Fig. 3). At a probability level of less than 0.001, 61.5% of marker pairs separated by less than 1 Mbp were in LD, with very few marker pairs showing LD for larger physical distances (Fig. 3).

Population Structure and KinshipBased on 56 independent SSR markers, most (96.3%) pair-wise kinship estimates between genotypes were smaller than 0.05, 3.0% were between 0.05 and 0.1, and only 0.7%

Figure 2. Correlation between simple sequence repeat (SSR) marker heterozygosity and biomass yield in five environments and stem cell wall composition in one environment for an alfalfa breeding population. IA, Iowa; NY, New York; Que, Québec; 05, 2005; 06, 2006; ADL, acid detergent lignin; ADF, acid detergent fiber; NDF, neutral detergent fiber.

Page 7: Association Mapping of Biomass Yield and Stem Composition ... · Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population ... hypothesis

li et al.: association mapping in tetraploid alfalfa

were between 0.1 and 0.35. We examined the possibility that the overall population of 190 individuals was struc-tured into subpopulations (z) using STRUCTURE. As z increased, the log likelihood value of each number of subpopulations being the true value increased smoothly (Fig. 4a), which suggested no population structure. To fur-ther investigate population subdivision, we calculated ΔK (Evanno et al., 2005), which indicated that the population could be partitioned into two subpopulations (Fig. 4b).

Association MappingWe performed association analysis between each of the 312 nonrare alleles and each phenotypic trait. Based on the goodness of fit among the four mixed model equa-tions, model E4 was the best for all traits, given that it had the smallest BIC value (Table 3). In contrast, model E1 was generally the worst, suggesting that accounting for both population structure and kinship was desirable, even though both were relatively small effects.

More alleles were associated (p < 0.005) with trait phe-notypes in model E1 than in model E4 for all traits, which indicated that some alleles identified in model E1 could have been associated with traits due to population struc-ture and/or genotypic relatedness rather than due to true genetic variation in the trait (Table 3). Model E2 showed greater improvement in terms of BIC and the number of significant alleles relative to model E1 than did model E3 for most traits, suggesting that kinship was more effective to control than population substructure in this population (Table 3). Model E4 showed slight improvement for some traits compared to model E2 in terms of BIC and the num-ber of significant alleles (Table 3).

Based on model E4, 15 alleles derived from 13 mark-ers showed strong association (p < 0.005) with yield in at least one of five environments (Fig. 5). Twelve markers could be located in the genome based on their physi-cal location on the M. truncatula genome and/or their position on genetic linkage maps of diploid alfalfa (X.

Figure 3. The frequency of significant tests for genotypic linkage disequilibrium (LD) between pairs of simple sequence repeat (SSR) markers separated by different physical distances across a range of p -values for an alfalfa breeding population.

Figure 4. Population structure analysis of an alfalfa breeding population. The top panel shows the average log–likelihood value and the bottom panel shows the ad hoc quantity, ΔK, for differing numbers of subpopulations (z) within the population.

Page 8: Association Mapping of Biomass Yield and Stem Composition ... · Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population ... hypothesis

the plant genome march 2011 vol. 4, no. 1

Li and E. C. Brummer, unpublished data, 2010) (Fig. 5). Most alleles showed associations across multiple envi-ronments (Fig. 5). Allele aw688546_301 showed strong association with yield in four environments (2 yr each in IA and NY) and a weak association (p < 0.05) in the fifth environment (Que06). Another allele from the same marker, aw688546_304, showed strong association with yield of NY05 (Fig. 5). Two alleles from marker bg281 (205 and 206) both showed association with yield in some New York and Québec environments (Fig. 5). Allele bg455405_159 showed strong association at New York, moderate association (p < 0.01) at Québec, and weak association at Iowa (Fig. 5). Alleles aw694047_213 and

aw690665_232 showed strong association with yield both years in Iowa and a weak association in Québec (Fig. 5).

Based on model E4, no allele was associated with NDF at p < 0.005 (Fig. 5). Only one allele, al375136_169, showed strong association with ADF (Fig. 5). This allele was also associated with NDF at p < 0.05 (Fig. 5). For both traits, the allele al375136_169 had a positive effect, meaning that the presence of the allele increased NDF and ADF. Only one allele, be318471_266, showed associa-tion with ADL at p < 0.005 (Fig. 5). Interestingly, marker be318471 showed strong association with both yield and ADL. Individuals carrying allele be318471_266 tended to have lower ADL concentrations, while individuals

Table 3. The Bayesian information criterion (BIC) for four models assessing marker-trait associations. In parentheses are the numbers of alleles that showed associations at p < 0.005 for each test.

ModelYield† Cell wall composition‡

IA05 IA06 NY05 NY06 Que06 NDF ADF ADLE1 2753.6 (9) 2764.5 (10) 2720.1 (8) 2749.2 (6) 2873.8 (11) 757.8 (1) 628.1 (2) 324.6 (3)E2 2741.6 (5) 2758.5 (6) 2707.6 (4) 2674.4 (5) 2807.7 (7) 748.5 (0) 619.8 (1) 299.6 (1)E3 2741.8 (8) 2751.7 (8) 2708.3 (7) 2736.5 (7) 2861.0 (13) 756.6 (1) 629.5 (2) 325.8 (3)E4 2717.7 (4) 2743.3 (5) 2688.9 (4) 2666.1 (5) 2794.2 (5) 744.3 (0) 622.6 (1) 299.3 (1)†IA, Iowa; NY, New York; Que, Québec; 05, 2005; 06, 2006.‡NDF, neutral detergent fiber; ADF, acid detergent fiber; ADL, acid detergent lignin.

Figure 5. Simple sequence repeat (SSR) marker alleles associated with biomass yield in five environments or with stem cell wall compo-sition (neutral detergent fiber [NDF], acid detergent fiber [ADF], and acid detergent lignin [ADL]) evaluated in one environment for an alfalfa breeding population. IA, Iowa; NY, New York; Que, Québec; 05, 2005; 06, 2006.

Page 9: Association Mapping of Biomass Yield and Stem Composition ... · Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population ... hypothesis

li et al.: association mapping in tetraploid alfalfa

carrying a different allele, be318471_257, tended to have higher yield in Iowa.

In general, individual markers explained 2 to 6% of the variation in yield in this population when evaluated in multiple regression models (results not shown). This information is of limited value given the relatively small population size and the likely overestimate of the effects as well as the unknown linkage distances between marker and trait. The results suggest, however, that biomass yield is likely controlled by many genes of small effect.

DiscussionHeterozygosity and Its Correlations with PhenotypesThe average allele numbers per genotype across the 71 SSR markers is 2.10 in our population, which is similar to the heterozygosity we previously found in three diverse tetraploid cultivated alfalfa genotypes, which had 1.92 to 2.15 alleles per marker per genotype (Li et al., 2009). In this study, we found that yield had small but positive cor-relations with heterozygosity in all five environments, but NDF, ADF, and ADL were not correlated with marker heterozygosity. Previously, a correlation between yield and heterozygosity, estimated by restriction fragment length polymorphism (RFLP) markers, was also found in an experimental tetraploid alfalfa population (Kidwell et al., 1994). Heterosis is often observed for biomass yield in alfalfa (e.g., Riday et al., 2002; Riday and Brummer, 2005) but not for composition (Riday et al., 2002). The correla-tion between yield and heterozygosity supports the idea that complementary favorable alleles in repulsion phase at linked loci contribute to yield and heterosis for yield (Bingham et al., 1994; Li and Brummer, 2009).

Linkage DisequilibriumLinkage disequilibrium in plants has been shown to extend from hundreds of base pairs to hundreds of kilo-base pairs, depending on the species and population type analyzed (Alm et al., 2003; Hyten et al., 2007; Ingvars-son, 2005; Liu and Burke, 2006; Mather et al., 2007; Morrell et al., 2005; Nordborg et al., 2002; Remington et al., 2001; Simko et al., 2006). Biological factors such as recombination rate, genetic drift, selection, mutation, and population admixture all affect LD (Flint-Garcia et al., 2003). Because autogamous species have a low effec-tive recombination rate, they typically have more LD than allogamous species (Nordborg, 2000).

We found LD (p < 0.001) between 61.5% of SSR marker pairs separated by less than 1 Mbp. In this population, the extent of LD was potentially affected by selection and genetic drift. The synthetic tetraploid alfalfa population used in this study was derived from 300 individuals of three cultivars (100 individuals from each cultivar). Each of the three cultivars resulted from selection and a restricted genetic base that could have resulted in genetic drift from a broadly-based panmictic population. Further popula-tion restriction and unintentional selection leading to the

development of the population under evaluation could have further increased the extent of LD observed in this study. Additionally, cultivated populations often have more LD than wild populations, as shown in sunflower (Helianthus annuus L.) (Liu and Burke, 2006) and barley (Hordeum vul-gare L.) (Caldwell et al., 2006), which could account for the relatively long extent of LD in this population.

In maize, relatively high levels of genome-wide LD were seen using SSR markers, even though LD rapidly declines between single nucleotide polymorphisms (SNPs) within genes. This result may be explained by a higher mutation rate in SSRs than at SNP locations (Remington et al., 2001). Thus, another possible expla-nation for the length of LD in our population may be because of high mutation rates of SSR markers. Fur-ther, LD decay is expected to increase as the number of founders increase, as was seen with a synthetic ryegrass (Lolium perenne L.) population (Auzanneau et al., 2007). Further, the effects of bottlenecking and selection on LD are diminished in tetraploid alfalfa due to the presence of four homologous copies per chromosome per plant. These factors all suggest that the levels of LD we observed may not reflect the true extent of LD that might be seen based on DNA sequence.

The extent of LD will dictate whether a sufficient number of markers can be evaluated to scan the entire genome or if association analyses need to be restricted to a selected set of candidate genes. Rapidly declining LD will necessitate either a very large number of markers to ade-quately cover the whole genome or the use of SNP targeted to candidate genes, the latter of which will only assay a small part of the genome, albeit at loci that may be dispro-portionately associated with the trait. One estimate of LD in a CONSTANS-LIKE gene in alfalfa suggested that LD decayed within the length of the gene (Herrmann et al., 2010). That result is based on sequencing of 400 individu-als derived from 10 different cultivars, perhaps a broader genetic pool than that used in our experiment.

Our estimate of genome size is biased. We know that the 1C genome size of alfalfa is about 0.85 pg (Blondon et al., 1994) or about 830 Mbp, and the 1C genome size of M. truncatula cv. Jemalong is 0.575 pg (Blondon et al., 1994) or about 560 Mbp. Thus, the alfalfa genome is about 150% the size of M. truncatula. We computed physical distances between markers based on the M. truncatula reference genome, build 3.0. The genetic maps of alfalfa and M. truncatula are highly colinear (Choi et al., 2004), so we can assume that distances between the markers in the alfalfa genome are roughly proportional to those in M. truncatula. If we further assume that LD extends up to 1 Mbp, then we will need ~1000 markers to fully saturate the genome for association mapping. For routine use in breeding programs—for instance, as applied to genome-wide selection (Heffner et al., 2009)—this number of markers may not be cost effective. If our estimate of LD is overly optimistic, then we have little hope of using genome-wide markers in breeding popula-tions like this one in the foreseeable future. How well this

Page 10: Association Mapping of Biomass Yield and Stem Composition ... · Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population ... hypothesis

the plant genome march 2011 vol. 4, no. 1

population reflects other commercial breeding popula-tions is not known, although the methods used to create this population are typical of those used by the alfalfa breeding community (e.g., Rumbaugh et al., 1988).

Population StructureOur population does not show strong evidence of sub-population structure because tests for subpopulation numbers (z) resulted in smoothly increasing log likeli-hood values as z increased. Although the ad hoc quantity (ΔK) suggested that subpopulation number might be two, adding a population structure effect to our model had little improvement in terms of the goodness of fit and number of alleles associated with traits. Thus, we can conclude that there was no clear population structure in our breeding population. Clear population structure was not present in collections of elite alfalfa, barley, and potato (Solanum tuberosum L.) breeding germplasm (Flajoulot et al., 2005; Kraakman et al., 2004; Malosetti et al., 2007), possibly because of the relatively narrow genetic base in those elite breeding germplasm (Malosetti et al., 2007). In contrast, clear population structure is generally observed in broad-based germplasm collec-tions (Comadran et al., 2009; Remington et al., 2001). We observed very clear substructure in a broad-based collection of wild diploid alfalfa germplasm (Sakiroglu et al., 2010). In this study, our population was derived from the random mating of a set of founders originating from three elite tetraploid cultivars, all of which were adapted to the northern part of the alfalfa growing area in the United States. This relatively narrow genetic background of cultivated germplasm combined with two generations of random mating apparently removed any substructure within the population. A similar lack of substructure was identified among a collection of seven alfalfa cultivars from France (Flajoulot et al., 2005).

Association MappingThe main challenge of association mapping is to sepa-rate associations between markers and traits that are real, resulting from physical linkage of genes (which is what we want), from those that are ephemeral as a result of population structure and/or kinship relatedness. To address this problem, we used the QK mixed model for association analysis to control for population structure and kinship (Yu et al., 2006). The QK method effectively eliminated spurious associations caused by population structure and kinship in previous experiments (Stich and Melchinger, 2009; Stich et al., 2008; Yu et al., 2006; Zhao et al., 2007). In this study, the model E4, which included both population structure and kinship, had better fit compared to other three models. Consistently, some alleles associated with traits that were identified in model E1 (no population structure and kinship), model E2 (with kinship only), or model E3 (with population structure only) did not show significant association in model E4, suggesting that some positive marker–trait associations were caused by population structure and/or kinship. In

terms of goodness of fit and reducing false positive asso-ciations, model E2 showed great improvement relative to model E1, but model E3 did not show a similar improve-ment over E1. This indicated that kinship was the key source that caused spurious association rather than population structure in this population.

Yield data were collected at three locations and in 2 yr for a total of five environments. Several alleles were asso-ciated with yield across multiple environments based on model E4. Many alleles showed genotype × environment interaction because they were identified in one (or a few) location but not all. Several markers also showed association with multiple traits. In the case of marker be318471, one allele was positively associated with yield and another allele negatively associated with ADL concentration in the stem. This result suggests that different alleles could be selected for the same marker locus when performing selection for multiple traits. One particularly encouraging result was that marker aw688546 previously associated with yield in a bipa-rental mapping population (Robins et al., 2007a) was also associated with yield in this experiment.

ConclusionsThis alfalfa breeding population shows no clear popula-tion substructure, a relatively long extent of LD based on SSR markers, and abundant variation for yield and cell wall composition. Thus, a genome scan approach to identify markers linked to traits should be feasible in this population. Because this is a breeding alfalfa population, alleles associated with traits are already in the breeding program, and their frequency could be easily increased using marker-assisted selection. However, based on our estimated LD level, we would need about 1000 markers to explore the whole alfalfa genome for association between markers and traits. Of the 71 SSR markers we evaluated, however, several associations with yield and cell wall com-position were identified. To fully identify markers strongly linked to QTL and efficiently improve the target traits by marker-assisted selection, more markers are needed.

Supplemental Information AvailableSupplemental material is available free of charge at

http://www.crops.org/publications/tpg.

AcknowledgmentsWe thank Mark Smith for field plot assistance and Trish Patrick for com-positional analysis assistance.This research was part of the NE1010 Regional Hatch Project “Breed-ing and Genetics of Forage Crops to Improve Productivity, Quality, and Industrial Uses,” and was supported by USDA-DOE Plant Feedstock Genomics for Bioenergy award 2006-35300-17224 to ECB. Xuehui Li and Yanling Wei contributed equally to this work.

ReferencesAlm, V., C. Fang, C.S. Busso, K.M. Devos, K. Vollan, Z. Grieg, and O.A.

Rognli. 2003. A linkage map of meadow fescue (Festuca pratensis Huds.) and comparative mapping with other Poaceae species. Theor. Appl. Genet. 108:25–40.

Auzanneau, J., C. Huyghe, B. Julier, and P. Barre. 2007. Linkage disequi-librium in synthetic varieties of perennial ryegrass. Theor. Appl. Genet. 115:837–847.

Page 11: Association Mapping of Biomass Yield and Stem Composition ... · Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population ... hypothesis

li et al.: association mapping in tetraploid alfalfa

Bingham, E.T., R.W. Groose, D.R. Woodfield, and K.K. Kidwell. 1994. Complementary gene interaction in alfalfa is greater in autotetra-ploids than diploid. Crop Sci. 34:823–829.

Blondon, F., D. Marie, S. Brown, and A. Kondorosi. 1994. Genome size and base composition in Medicago sativa and M. truncatula species. Genome 37:264–270.

Breseghello, F., and M.E. Sorrells. 2006. Association mapping of kernel size and milling quality in wheat (Triticum aestivum L.) cultivars. Genetics 172:1165–1177.

Brummer, E.C. 2004. Genomics research in alfalfa, Medicago sativa L. In R.F. Wilson et al. (ed.) Legume crop genomics. AOCS Press, Cham-paign, IL.

Caldwell, K.S., J. Russell, P. Langridge, and W. Powell. 2006. Extreme population-dependent linkage disequilibrium detected in an inbreeding plant species, Hordeum vulgare. Genetics 172:557–567.

Choi, H.K., D. Kim, T. Uhm, E. Limpens, H. Lim, J.H. Mun, P. Kalo, R.V. Penmetsa, A. Seres, O. Kulikova, B.A. Roe, T. Bisseling, G.B. Kiss, and D.R. Cook. 2004. A sequence-based genetic map of Medicago truncatula and comparison of marker colinearity with M. sativa. Genetics 166:1463–1502.

Comadran, J., W.T. Thomas, F.A. van Eeuwijk, S. Ceccarelli, S. Grando, A.M. Stanca, N. Pecchioni, T. Akar, A. Al-Yassin, A. Benbelkacem, H. Ouabbou, J. Bort, I. Romagosa, C.A. Hackett, and J.R. Russell. 2009. Patterns of genetic diversity and linkage disequilibrium in a highly structured Hordeum vulgare association-mapping population for the Mediterranean basin. Theor. Appl. Genet. 119:175–187.

Coors, J.G., C.C. Lowe, and R.P. Murphy. 1986. Selection for improved nutritional quality of alfalfa forage. Crop Sci. 26:843–848.

Dien, B.S., H.J.G. Jung, K.P. Vogel, M.D. Casler, J.F.S. Lamb, L. Iten, R.B. Mitchell, and G. Sarath. 2006. Chemical composition and response to dilute-acid pretreatment and enzymatic saccharification of alfalfa, reed canarygrass, and switchgrass. Biomass Bioenergy 30:880–891.

Ducrocq, S., D. Madur, J.B. Veyrieras, L. Camus-Kulandaivelu, M. Kloi-ber-Maitz, T. Presterl, M. Ouzunova, D. Manicacci, and A. Charcos-set. 2008. Key impact of Vgt1 on flowering time adaptation in maize: Evidence from association mapping and ecogeographical informa-tion. Genetics 178:2433–2437.

Evanno, G., S. Regnaut, and J. Goudet. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: A simula-tion study. Mol. Ecol. 14:2611–2620.

Falush, D., M. Stephens, and J.K. Pritchard. 2007. Inference of population structure using multilocus genotype data: Dominant markers and null alleles. Mol. Ecol. Notes 7:574–578.

Flajoulot, S., J. Ronfort, P. Baudouin, P. Barre, T. Huguet, C. Huyghe, and B. Julier. 2005. Genetic diversity among alfalfa (Medicago sativa) cultivars coming from a breeding program, using SSR markers. Theor. Appl. Genet. 111:1420–1429.

Flint-Garcia, S.A., J.M. Thornsberry, and E.S. Buckler. 2003. Structure of linkage disequilibrium in plants. Annu. Rev. Plant Biol. 54:357–374.

Goplen, B.P., R.E. Howarth, and G.L. Lees. 1993. Selection of alfalfa for a lower initial rate of digestion and corresponding changes in epider-mal and mesophyll cell-wall thickness. Can. J. Plant Sci. 73:111–122.

Hamelinck, C.N., G. van Hooijdonk, and A.P.C. Faaij. 2005. Ethanol from lignocellulosic biomass: Techno-economic performance in short-, middle- and long-term. Biomass Bioenergy 28:384–410.

Hardy, O.J., and X. Vekemans. 2002. SPAGEDi: A versatile computer pro-gram to analyse spatial genetic structure at the individual or popu-lation levels. Mol. Ecol. Notes 2:618–620.

Heffner, E.L., M.E. Sorrells, and J.L. Jannink. 2009. Genomic selection for crop improvement. Crop Sci. 49:1–12.

Herrmann, D., P. Barre, S. Santoni, and B. Julier. 2010. Association of a CONSTANS-LIKE gene to flowering and height in autotetraploid alfalfa. Theor. Appl. Genet. 121:865–876.

Hyten, D.L., I.Y. Choi, Q. Song, R.C. Shoemaker, R.L. Nelson, J.M. Costa, J.E. Specht, and P.B. Cregan. 2007. Highly variable patterns of linkage disequilibrium in multiple soybean populations. Genetics 175:1937–1944.

Ingvarsson, P.K. 2005. Nucleotide polymorphism and linkage disequi-librium within and among natural populations of European aspen (Populus tremula L., Salicaceae). Genetics 169:945–953.

Jannink, J.L., and B. Walsh. 2002. Association mapping in plant popula-tions. p. 59–68. In M.S. Kang (ed.) Quantitative genetics, genomics and plant breeding. CAB Int., New York, NY.

Julier, B., S. Flajoulot, P. Barre, G. Cardinet, S. Santoni, T. Huguet, and C. Huyghe. 2003. Construction of two genetic linkage maps in cul-tivated tetraploid alfalfa (Medicago sativa) using microsatellite and AFLP markers. BMC Plant Biol. 3:9.

Jung, H.G., M.M. DeLong, E.A. Oelke, N.P. Martin, and D.K. Barnes. 1994b. Alfalfa hay as a biomass energy crop for electricity genera-tion. p. 50. In Proc. North American Alfalfa Improvement Conf., 34th, Guelph, ON, Canada. 10–14 July 1994. North American Alfalfa Improvement Conf., Beltsville, MD.

Jung, H.G., C.C. Sheaffer, D.K. Barnes, and J.L. Halgerson. 1997. For-age quality variation in the US alfalfa core collection. Crop Sci. 37:1361–1366.

Jung, H.G., R.R. Smith, and C.S. Endres. 1994a. Cell-wall composition and degradability of stem tissue from lucerne divergently selected for lignin and in-vitro dry-matter disappearance. Grass Forage Sci. 49:295–304.

Kephart, K.D., D.R. Buxton, and R.R. Hill. 1990. Digestibility and cell-wall components of alfalfa following selection for divergent herbage lignin concentration. Crop Sci. 30:207–212.

Kidwell, K.K., E.T. Bingham, D.R. Woodfield, and T.C. Osborn. 1994. Relationships among genetic-distance, forage yield and heterozy-gosity in isogenic diploid and tetraploid alfalfa populations. Theor. Appl. Genet. 89:323–328.

Kraakman, A.T., R.E. Niks, P.M. Van den Berg, P. Stam, and F.A. Van Eeuwijk. 2004. Linkage disequilibrium mapping of yield and yield stability in modern spring barley cultivars. Genetics 168:435–446.

Lamb, J.F.S., C.C. Sheaffer, L.H. Rhodes, R.M. Sulc, D.J. Undersander, and E.C. Brummer. 2006. Five decades of alfalfa cultivar improvement: Impact on forage yield, persistence, and nutritive value. Crop Sci. 46:902–909.

Li, X., Y. Wei, D. Nettleton, and E.C. Brummer. 2009. Comparative gene expression profiles between heterotic and non-heterotic hybrids of tetraploid Medicago sativa. BMC Plant Biol. 9:107.

Li, X.H., and E.C. Brummer. 2009. Inbreeding depression for fertility and biomass in advanced generations of inter- and intrasubspecific hybrids of tetraploid alfalfa. Crop Sci. 49:13–19.

Liu, A., and J.M. Burke. 2006. Patterns of nucleotide diversity in wild and cultivated sunflower. Genetics 173:321–330.

Mackay, I., and W. Powell. 2007. Methods for linkage disequilibrium mapping in crops. Trends Plant Sci. 12:57–63.

Malosetti, M., C.G. van der Linden, B. Vosman, and F.A. van Eeuwijk. 2007. A mixed-model approach to association mapping using pedi-gree information with an illustration of resistance to Phytophthora infestans in potato. Genetics 175:879–889.

Mather, K.A., A.L. Caicedo, N.R. Polato, K.M. Olsen, S. McCouch, and M.D. Purugganan. 2007. The extent of linkage disequilibrium in rice (Oryza sativa L.). Genetics 177:2223–2232.

McKendry, P. 2002. Energy production from biomass (Part 2): Conversion technologies. Bioresour. Technol. 83:47–54.

Morrell, P.L., D.M. Toleno, K.E. Lundy, and M.T. Clegg. 2005. Low levels of linkage disequilibrium in wild barley (Hordeum vulgare ssp. spon-taneum) despite high rates of self-fertilization. Proc. Natl. Acad. Sci. USA 102:2442–2447.

Nordborg, M. 2000. Linkage disequilibrium, gene trees and selfing: An ancestral recombination graph with partial self-fertilization. Genet-ics 154:923–929.

Nordborg, M., J.O. Borevitz, J. Bergelson, C.C. Berry, J. Chory, J. Hagen-blad, M. Kreitman, J.N. Maloof, T. Noyes, P.J. Oefner, E.A. Stahl, and D. Weigel. 2002. The extent of linkage disequilibrium in Arabi-dopsis thaliana. Nat. Genet. 30:190–193.

Pajerowska-Mukhtar, K., B. Stich, U. Achenbach, A. Ballvora, J. Lubeck, J. Strahwald, E. Tacke, H.R. Hofferbert, E. Ilarionova, D. Bellin, B. Walkemeier, R. Basekow, B. Kersten, and C. Gebhardt. 2009. Single nucleotide polymorphisms in the allene oxide synthase 2 gene are

Page 12: Association Mapping of Biomass Yield and Stem Composition ... · Association Mapping of Biomass Yield and Stem Composition in a Tetraploid Alfalfa Breeding Population ... hypothesis

the plant genome march 2011 vol. 4, no. 1

associated with field resistance to late blight in populations of tetra-ploid potato cultivars. Genetics 181:1115–1127.

Pritchard, J.K., M. Stephens, and P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945–959.

Remington, D.L., J.M. Thornsberry, Y. Matsuoka, L.M. Wilson, S.R. Whitt, J. Doebley, S. Kresovich, M.M. Goodman, and E.S. Buckler. 2001. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc. Natl. Acad. Sci. USA 98:11479–11484.

Riday, H., and E.C. Brummer. 2002. Forage yield heterosis in alfalfa. Crop Sci. 42:716–723.

Riday, H., and E.C. Brummer. 2005. Heterosis in a broad range of alfalfa germplasm. Crop Sci. 45:8–17.

Riday, H., E.C. Brummer, and K.J. Moore. 2002. Heterosis of forage qual-ity in alfalfa. Crop Sci. 42:1088–1093.

Robins, J.G., G.R. Bauchan, and E.C. Brummer. 2007a. Genetic mapping forage yield, plant height, and regrowth at multiple harvests in tetra-ploid alfalfa (Medicago sativa L.). Crop Sci. 47:11–18.

Robins, J.G., D. Luth, I.A. Campbell, G.R. Bauchan, C.L. He, D.R. Viands, J.L. Hansen, and E.C. Brummer. 2007b. Genetic mapping of biomass production in tetraploid alfalfa. Crop Sci. 47:1–10.

Rumbaugh, M.D., J.L. Caddel, and D.E. Rowe. 1988. Breeding and quan-titative genetics. p. 777–808. In A.A. Hanson et al., (ed.). Alfalfa and alfalfa improvement. ASA, CSSA, and SSSA, Madison, WI.

Sakiroglu, M., J.J. Doyle, and E. Charles Brummer. 2010. Inferring popu-lation structure and genetic diversity of broad range of wild diploid alfalfa (Medicago sativa L.) accessions using SSR markers. Theor. Appl. Genet. 121:403–415.

Samac, D.A., H.J.G. Jung, and J.F.S. Lamb. 2006. Development of alfalfa (Medicago sativa L.) as a feedstock for production of ethanol and other bioproducts. p. 79. In S. Minteer (ed.) Alcoholic fuels. CRC Press, Boca Raton, FL.

SAS Institute. 2004. SAS 9.1.3 Help and documentation. SAS Institute, Cary, NC.

Schuelke, M. 2000. An economic method for the fluorescent labeling of PCR fragments. Nat. Biotechnol. 18:233–234.

Sheaffer, C.C., D. Cash, N.J. Ehlke, J.C. Henning, J.G. Jewett, K.D. John-son, M.A. Peterson, M. Smith, J.L. Hansen, and D.R. Viands. 1998. Entry × environment interactions for alfalfa forage quality. Agron. J. 90:774–780.

Sheaffer, C.C., N.P. Martin, J.F.S. Lamb, G.R. Cuomo, J.G. Jewett, and S.R. Quering. 2000. Leaf and stem properties of alfalfa entries. Agron. J. 92:733–739.

Simko, I., K.G. Haynes, and R.W. Jones. 2006. Assessment of linkage dis-equilibrium in potato genome with single nucleotide polymorphism markers. Genetics 173:2237–2245.

Sledge, M.K., I.M. Ray, and G. Jiang. 2005. An expressed sequence tag SSR map of tetraploid alfalfa (Medicago sativa L.). Theor. Appl. Genet. 111:980–992.

Stich, B., and A.E. Melchinger. 2009. Comparison of mixed-model approaches for association mapping in rapeseed, potato, sugar beet, maize, and Arabidopsis. BMC Genomics 10:94.

Stich, B., J. Mohring, H.P. Piepho, M. Heckenberger, E.S. Buckler, and A.E. Melchinger. 2008. Comparison of mixed-model approaches for association mapping. Genetics 178:1745–1754.

Teuber, L.R., K.L. Taggard, L.K. Gibbs, M.H. McCaslin, M.A. Peterson, and D.K. Barnes. 1998. Fall Dormancy. In Standard tests to charac-terize alfalfa cultivars, 3rd ed. (Amended 1998). Available at http://www.naaic.org/stdtests/Dormancy2.html (verified 6 Jan. 2011). North American Alfalfa Improvement Conference, Beltsville, MD.

Van Soest, P.J., J.B. Robertson, and B.A. Lewis. 1991. Symposium: Carbo-hydrate methodology, metabolism, and nutritional implications in dairy cattle. J. Dairy Sci. 74:3583–3597.

Viands, D.R., C.C. Lowe, R.P. Murphy, and R.L. Millar. 1990. Registration of ‘Oneida VR’ alfalfa. Crop Sci. 30:955.

Vogel, K.P., J.F. Pedersen, S.D. Masterson, and J.J. Toy. 1999. Evaluation of a filter bag system for NDF, ADF, and IVDMD forage analysis. Crop Sci. 39:276–279.

Windham, W.R., D.R. Mertens, and F.E. Barton, II. 1989. Protocol for NIRS calibration: Sample selection and equation development and validation. p. 96–103. In G.C. Marten et al. (ed.) Near infrared reflectance spectroscopy (NIRS): Analysis of forage quality. USDA Agric. Handbook 643. U.S. Government Printing Office, Washing-ton, DC.

Yu, J., G. Pressoir, W.H. Briggs, I. Vroh Bi, M. Yamasaki, J.F. Doebley, M.D. McMullen, B.S. Gaut, D.M. Nielsen, J.B. Holland, S. Kresovich, and E.S. Buckler. 2006. A unified mixed-model method for associa-tion mapping that accounts for multiple levels of relatedness. Nat. Genet. 38:203–208.

Zhao, K., M.J. Aranzana, S. Kim, C. Lister, C. Shindo, C. Tang, C. Tooma-jian, H. Zheng, C. Dean, P. Marjoram, and M. Nordborg. 2007. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3:e4.