disentangling the roles of history and local selection in ... · norway spruce [picea abies (l.)...
TRANSCRIPT
INVESTIGATION
Disentangling the Roles of History and LocalSelection in Shaping Clinal Variation of AlleleFrequencies and Gene Expression in Norway
Spruce (Picea abies)Jun Chen,* Thomas Källman,* Xiaofei Ma,* Niclas Gyllenstrand,† Giusi Zaina,‡ Michele Morgante,‡
Jean Bousquet,§ Andrew Eckert,** Jill Wegrzyn,†† David Neale,†† Ulf Lagercrantz,* and Martin Lascoux*,‡‡,1
*Department of Ecology and Genetics, Evolutionary Biology Center, Uppsala University, 752 36 Uppsala, Sweden, †Department ofPlant Biology and Forest Genetics, Swedish University of Agricultural Sciences, SE-750 07 Uppsala, Sweden, ‡Dipartimento diScienze Agrarie e Ambientali, Universita di Udine, 33100 Udine, Italy, §Canada Research Chair in Forest and Environmental
Genomics, Institute for Systems and Integrative Biology, Université Laval, Quebec City, Quebec, Canada G1V 0A6, **Departmentof Biology, Virginia Commonwealth University, Richmond, Virginia 23284-2012, ††Department of Plant Sciences, University of
California, Davis, California 95616, and ‡‡Laboratory of Evolutionary Genomics, Chinese Academy of Sciences-Max-Planck-Gesellschaft Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
ABSTRACT Understanding the genetic basis of local adaptation is challenging due to the subtle balance among conflictingevolutionary forces that are involved in its establishment and maintenance. One system with which to tease apart these difficulties isclines in adaptive characters. Here we analyzed genetic and phenotypic variation in bud set, a highly heritable and adaptive trait,among 18 populations of Norway spruce (Picea abies), arrayed along a latitudinal gradient ranging from 47�N to 68�N. We confirmedthat variation in bud set is strongly clinal, using a subset of five populations. Genotypes for 137 single-nucleotide polymorphisms (SNPs)chosen from 18 candidate genes putatively affecting bud set and 308 control SNPs chosen from 264 random genes were analyzed forpatterns of genetic structure and correlation to environment. Population genetic structure was low (FST ¼ 0.05), but latitudinal patternswere apparent among Scandinavian populations. Hence, part of the observed clinal variation should be attributable to populationdemography. Conditional on patterns of genetic structure, there was enrichment of SNPs within candidate genes for correlations withlatitude. Twenty-nine SNPs were also outliers with respect to FST. The enrichment for clinal variation at SNPs within candidate genes(i.e., SNPs in PaGI, PaPhyP, PaPhyN, PaPRR7, and PaFTL2) indicated that local selection in the 18 populations, and/or selection in theancestral populations from which they were recently derived, shaped the observed cline. Validation of these genes using expressionstudies also revealed that PaFTL2 expression is significantly associated with latitude, thereby confirming the central role played by thisgene in the control of phenology in plants.
LOCAL adaptation is a key process in the evolution ofspecies. Understanding how local adaptation is estab-
lished and maintained, however, is especially difficult asits establishment is contingent upon historical conditionsand its maintenance depends on the balance among conflict-ing evolutionary forces (e.g., Yeaman and Otto 2011). It is
a particularly challenging task in forest trees, because theyhave long generation times and therefore cannot be easilymanipulated experimentally. For instance, transfer experi-ments are theoretically possible but practically difficult toimplement. On the other hand, the analysis of the stronglatitudinal clines displayed by forest trees for potentiallyadaptive traits such as bud set (Dormling 1973; Savolainenet al. 2007; Aitken et al. 2008) can provide crucial informa-tion on the forces involved in local adaptation and, in par-ticular, on the relative parts played by demography andselection in the establishment of the cline. Furthermore, phe-nology in general, and flowering time and bud set in partic-ular, have been extensively studied and strong candidate
Copyright © 2012 by the Genetics Society of Americadoi: 10.1534/genetics.112.140749Manuscript received March 29, 2012; accepted for publication April 20, 2012Supporting information is available online at http://www.genetics.org/content/suppl/2012/04/27/genetics.112.140749.DC1.All data are archived in Dryad with provisional doi:10.5061/dryad.82201.1Corresponding author: Department of Ecology and Genetics, Evolutionary BiologyCentre, Uppsala University, 752 36 Uppsala, Sweden E-mail: [email protected]
Genetics, Vol. 191, 865–881 July 2012 865
genes are available, many of which belong to the photope-riodic pathway including the circadian clock (Gyllenstrandet al. 2007; Albani and Coupland 2010; Bergelson and Roux2010; Fornara et al. 2010; Holliday et al. 2010; Hsu et al.2011). Recent candidate gene studies on the genetic basis ofclinal variation in phenology in European aspen (Ma et al.2010) and Sitka spruce (Holliday et al. 2010; Lobo 2011)illustrate well the promises of this approach.
In the present study we focus on a conifer species,Norway spruce [Picea abies (L.) Kar.] and more specificallyon clinal variation in the northwest part of its natural range.The clines in phenological traits observed in this species andin this part of its range are intriguing for several reasons.First, as in many other forest tree species (Jaramillo-Correaet al. 2001; Savolainen et al. 2007; Alberto et al. 2011;Kremer et al. 2010; Prunier et al. 2011), Norway sprucepopulations are strongly differentiated for bud set [QST ¼0.74 (R. Liesch, unpublished data)], a trait with high heri-tability [average h2 ¼ 0.63 (varying from 0.33 to 0.78 across15 populations) (R. Liesch, unpublished data)]. This stronglevel of phenotypic differentiation for bud set contrasts withextensive gene flow and limited genetic differentiationamong populations at neutral genetic markers. Geneticstructure within and among Norway spruce populations islargely accounted for by two major geographically basedgroups (FST � 0.10 between groups; FST � 0.05 withingroups), the Nordic–Baltic group and the Alpine group(Heuertz et al. 2006). Second, tree populations in North-western Europe are young, as most of the region was cov-ered by ice during the Last Glacial Maximum (LGM)(�18,000 YBP). The combined use of palynological data,macrofossils, and maps of genetic diversity suggests thatNorway spruce recolonized Scandinavia after the LGM frompopulations located in central Russia and possibly also fromcryptic refugia in the northern part of European Russia(Binney et al. 2009; Väliranta et al. 2011). The spread west-ward followed two main recolonization routes, a northernroute over Finland and into northern Scandinavia anda southern route going through the Baltic states (Tollefsrudet al. 2009). Norway spruce reached eastern Finland �6500years ago and east-central Sweden �2700 years ago, afterwhich it took 100–500 years to reach the present populationsize and to replace the existing Pinus–Betula–Alnus forests(Giesecke and Bennett 2004; Seppä et al. 2009). This historyis reflected in a complex population genetic structure acrossthe extant range of this species despite extensive gene flow(Heuertz et al. 2006).
The recolonization of Scandinavia from different LGMrefugia is not specific to Norway spruce and has beenobserved in various plant and animal species (e.g., Jaarolaet al. 1999; De Carvalho et al. 2010). Because Scandinaviawas recolonized very recently and from different refugia, thepresence of steep clines in phenological traits raises impor-tant questions. Do the clines reflect local adaptation onde novo mutations occurring only after the recolonization,i.e., under strong local selection over a very short period of
time? Or do they reflect the deployment in space of poly-morphisms already segregating in the ancestral populations,i.e., through soft sweeps (Pritchard et al. 2010; Savolainenet al. 2011)? A correct interpretation of the clines, thus,requires that population history and the ensuing populationgenetic structure (e.g., pattern of isolation-by-distance) aretaken into account. This is because differences in allele fre-quencies among populations (e.g., between southern andnorthern Swedish spruce populations) could be due to ex-pansion from different glacial refugia where selection, orsimply random genetic drift, led to the fixation of differentalleles. This, of course, would not necessarily imply thatlocal selection favoring different alleles in different areasdoes not occur, but it would nonetheless imply that it mightnot be the only explanation for the observed clinal variation.Finally, clinal variation in phenology could also partly reflectphenotypic plasticity, an aspect that was not investigated here,but that is important in responses by tree populations to cli-mate (see Bresson et al. 2011).
In the present study, after validating the presence of clinalvariation in bud set in a subsample of five populations, weexamined single-nucleotide polymorphism (SNP) frequenciesat candidate genes for timing of bud set in 18 Norway sprucepopulations for correlations with latitude (Supporting Infor-mation, Figure S1). Various approaches, capturing differentfacets of the data, were used to detect and interpret the clinalvariation at candidate gene SNPs. We then tested for clinalvariation in gene expression in a subset of the genes for whichclinal variation was detected for SNP allele frequencies. Thisstudy offers a clear illustration of the interplay betweendemographic history and selection in shaping genetic variationand of the intrinsic difficulty in telling the two apart (Li et al.2012).
Materials and Methods
Sampling and measurement of bud set
Seeds of 303 Norway spruce trees were collected from 18populations in four countries: Germany (1), Russia (1),Finland (7), and Sweden (9) at latitudes ranging from 47�Nto 68�N (see Table 1 and Figure 1). To ascertain the pres-ence of clinal variation in bud set, seeds were randomlysampled from 5 populations [SE-58.1 (Saleby), SE-61.6(Istevallen), SE-62.7 (Högmansbod), SE-64.1 (Jämtland),and SE-66.4 (Jock)] with latitudes ranging from 58�N to66�N. Populations from northern Finland, which were usedto analyze clinal variation in allele frequency, were not usedhere and were replaced by population SE-66.4 since theirseeds had low germination rates. The number of seeds perpopulation varied between 36 and 48 and the number ofmaternal trees from 13 to 20. The seeds were germinatedand the seedlings were then cultured in a growth chamberunder continuous light (250 mmol m22 sec21 light and 400–700 nm) and a temperature of 20� for 8 weeks. Thereafterthe seedlings were grown under increasing night length.Each photoperiod lasted 1 week starting with 1 week of
866 J. Chen et al.
continuous light (LL), then 1 week with 22 hr light/2 hrdark, and so on. The dark period was extended by 1.5 hreach week until a photoperiod of 14.5 hr light/9.5 hr darkwas reached. The latter lasted for 2 weeks (see Figure S2 fora detailed schedule).
Bud set was defined as two categories: 0, no bud at all;and 1, when the bud started to appear until bud set wascomplete. Bud set was measured at the end of everyphotoperiodic treatment from week 5 (17.5 hr light/6.5 hrdark) to week 8 (14.5 hr light/9.5 hr dark). The number ofindividuals that had set bud was counted in each population.
DNA extraction and SNP genotyping
Genomic DNA was extracted from megagametophytes usingthe DNeasy Plant Mini Kit (QIAGEN, Germantown, MD) andthen amplified using the Illustra GenomiPhi V2 DNAAmplification Kit (GE Healthcare Life Sciences, Piscataway,NJ). A total of 445 SNPs representing 282 genes weregenotyped using an Illumina GoldenGate 768 SNP array.The SNPs used in this study come from three differentsources. The first set of 142 SNPs, including 137 candidateSNPs, comes from our own resequencing efforts in a discov-ery panel of 48 Norway spruce alleles and comprises mostlySNPs from 18 putative candidate genes for bud set. Thecandidate genes were chosen among genes of the photope-riodic control pathway in Arabidopsis thaliana (Fornara et al.2010) and based on information provided by experimentson bud set (e.g., Gyllenstrand et al. 2007) or on specific genefamilies (Karlgren et al. 2011) in Norway spruce. They wereputatively involved in the photoperiodic pathway [PaCOL1,PaCOL2, PaCRY, PaPHYN, PaPHYO, PaPHYP, and PaFTL2and its promoter (formerly called FT4 in Gyllenstrandet al. 2007)], the circadian clock (PaCCA1 and its promoter,
PaPRR1, PaPRR3, PaPRR7, and PaGI), or shoot apical devel-opment (PaMFTL1, PaHB3, PaKN1, PaKN2, and PaKN4) ofNorway spruce (see Table S1 for details). Multiple haplo-types were sequenced for each gene with an average num-ber of 42 haplotypes per gene, except for three PaGIfragments for which only 8 haplotypes were sequenced.The second source of 83 SNPs is the Comparative Re-Sequencing in Pinaceae (CRSP) project (http://dendrome.ucdavis.edu/crsp/). Sequences used for SNP discovery camefrom 800 primer pairs originally designed for loblolly pine(Pinus taeda L.). This list of primer sets was constructedwithout knowledge of putative gene function and was basedsolely on the set of 800 that maximized the production ofhigh-quality sequence in single validation samples from eachof the seven species composing the CRSP project. A singletree from Germany was used as the validation sample forNorway spruce. The SNP discovery panel consisted of 12megagametophytes sampled in Poland, Latvia, Romania,Belarus, Ukraine, Germany, and Norway, which are fairlyrepresentative of the whole range. Finally, the remaining220 SNPs come from the Arborea project (http://www.arborea.ulaval.ca/) as follows. A total of 1964 SNPs from1485 genes initially identified and successfully genotyped inP. glauca were tested in 12 (diploid) individuals of Norwayspruce from 12 populations in central Europe, includingLatvia, Poland, and Germany. These 1485 genes were tran-scription factors or candidate genes related to growth, woodformation, and cell wall and lignin synthesis, which werededuced from expression studies in P. glauca or annotationsin Arabidopsis homologs. Testing was done by submittingthe panel directly to two Illumina GoldenGate SNP genotyp-ing chips (Beaulieu et al. 2011; Pelgas et al. 2011), resultingin the discovery of 384 valid SNPs from 340 genes (transfer
Table 1 Populations used in this study
GroupPopulation
codePopulation
name Country No. individuals Latitude (�N) Longitude (�E) Altitude (m)
Alpine domain GE-47.0 Ruhpolding Germany 30 47.00 12.23 1492SE-58.3 Saleby Sweden 15 58.37 13.15 70
Baltic–Nordic domainRussia RU-53.3 Bryansk Russia 15 53.30 34.30 179Central Fennoscandia FI-61.5 StKu19 Finland 19 61.57 29.22 105
SE-61.6 Istevallen Sweden 13 61.62 16.48 350SE-61.8 Tjarnelund Sweden 16 61.83 16.28 285FI-62.0 StKu7 Finland 16 62.07 24.48 105SE-62.6 Strangsund Sweden 21 62.63 15.05 415SE-62.7 Hogmansbod Sweden 15 62.78 15.23 350FI-63.0 StKu5 Finland 15 63.07 30.28 170SE-63.4 Mittberget Sweden 14 63.48 17.67 300SE-63.7 Hallen Sweden 13 63.74 15.50 375SE-64.1 Jämtland Sweden 17 64.17 19.36 355SE-65.3 Lillpite Sweden 18 65.31 18.70 242SE-66.4a Jock Sweden 48 66.41 22.44 145
Northern Finland FI-66.4 StKu2 Finland 12 66.40 26.88 175FI-67.0 Kolari Finland 27 67.04 26.36 264FI-67.7 StKu19 Finland 17 67.72 26.05 280FI-68.0 Muonio Finland 10 68.00 24.15 342
aSE-66.4 was used only in the expression study.
Clinal Variation in Norway Spruce 867
rate of 20%). Alleles with a frequency ,10% were removedfrom all SNP discovery panels. In summary, SNPs that weregenotyped successfully were grouped into two categories:137 candidate SNPs originating from 18 genes putativelyinvolved in the photoperiodic pathway and 308 controlSNPs originating from 264 genes not related to these func-tions. Importantly, the available information on the involve-ment of these candidate genes in the control of bud setis variable and consists of information in model species(A. thaliana and poplars) and physiological studies in Nor-way spruce. For example, while the level of expression ofPaFTL2 correlates with the timing of bud set in Norwayspruce (Gyllenstrand et al. 2007), no such direct evidenceexists for other candidate genes. It is also worth noting thatsince many of the statistical tests used in this study arebased on testing whether the number of candidate geneSNPs is overrepresented among the statistically significantSNPs, enlarging the candidate gene category with lesswell-characterized genes will make these tests more con-servative as enrichment will be less likely.
Linkage disequilibrium
Clinal variation of allele frequencies at SNPs unrelated tothe targets of natural selection can be generated throughlinkage. Linkage disequilibrium (LD) among all pairs of
SNPs, therefore, was estimated using the program Arlequinver. 3.5 (Excoffier and Lischer 2010). Significant LD wasidentified at the 5% level using chi-square tests with a Bon-ferroni correction for multiple comparisons and was basedon r2 (Hill and Robertson 1966). Analysis was conducted foreach population separately.
Genetic structure
To investigate population genetic structure, we used theBayesian algorithm that is implemented in the programSTRUCTURE ver. 2.3.3 (Pritchard et al. 2000; Hubisz et al.2009) to delineate clusters of individuals on the basisof their multilocus genotypes. All 18 populations and 264unlinked control loci were used in this analysis. Linkagedisequilibrium among SNPs within genes was limited asthe number of SNPs per gene was small (see below). Anadmixture model with correlated allele frequencies waschosen and the analysis was performed for a number ofclusters varying from K ¼ 1 to K ¼ 22. For each value ofK we performed 20 independent runs. Each run comprised10,000 iterations of burn-in and 10,000 MCMC steps,as those values were found to be sufficient in pilot runs[longer burn-in (100,000) and MCMC (1,000,000) didnot significantly change the results]. To determine theoptimal K value, we used both the mode of posterior
Figure 1 Geographic range of Norway spruce (Picea abies) and location of the population samples used in this study. The area shaded in green is thedistribution of Norway spruce (modified from EUFORGEN 2009, www.euforgen.org) and the dots are the geographical locations of the populationsused in this study.
868 J. Chen et al.
probabilities LnPr(X | K) (Pritchard et al. 2000) and DK(Evanno et al. 2005).
Mantel tests were performed to test for isolation-by-distance among all sampled populations. A Mantel test ofthe correlation between a matrix of pairwise FST betweenpopulations and the matrix of geographic distances was per-formed using the vegan library in R (Oksanen et al. 2011; RDevelopment Core Team 2011). The ratio (FST/(1 2 FST))and the logarithm of geographic distance were used in thistest as suggested by Rousset (1997). Finally, we calculatedhierarchical F-statistics using the hierfstat library in R(Goudet 2005).
Analysis of clinal variation in allele frequencies
To detect candidate genes potentially involved with localadaptation, we treated each SNP independently and useddifferent statistics to estimate the association between pop-ulation allele frequencies and latitude. For all statistics, weexamined the enrichment of SNPs located in candidategenes by calculating the ratio of candidate to control SNPsamong those that were significant at 1% or 5% for the teststatistics under consideration. We then used bootstrapresampling to assess the significance of the observed excess.Using this method should minimize the number of falsepositives, contingent upon the control SNPs largely reflect-ing neutral processes. By comparing the results obtainedwith the different methods, we obtained a set of candidateSNPs with a geographic pattern clearly distinct from thepattern at control SNPs.
Linear regression on latitude: If geographic variation isassociated to an environmental gradient covarying withlatitude, we expect the allele frequencies to be clinal alonglatitudinal gradients. Ancestral and derived alleles at eachSNP were determined using the southernmost population(RU-53.3, Bryansk) and nine individuals randomly sampledfrom four Brewer spruce (P. breweriana S. Watson) popula-tions as outgroups. In the few cases of conflict, we chose theSNP in Brewer spruce as the ancestral state. To measure theassociation of allele frequencies with population latitude,allele frequencies were first transformed using a square-rootarcsine function (Berry and Kreitman 1993) and thenregressed on population latitude. On the basis of the resultsof the population structure analysis, three different data setswere considered when analyzing clines. We first consideredthe complete data set. We then excluded the three south-ernmost populations. The GE-47.0 (Ruhpolding) and SE-58.3 (Saleby) populations belonged to a different clusterthan other populations (see below). These two populationswill create a pattern similar to the latitudinal cline, and wetherefore excluded them from this analysis. The RU-53.3(Bryansk) population is located much farther south fromthe rest of the populations and to avoid possible false pos-itives, we excluded it. Finally, we excluded the four north-ernmost populations (FI-66.4, FI-67.0, FI-67.7, and FI-68.0)
and considered a set including only populations from centralFennoscandia. Since the second data set seems the mostappropriate for the analysis of clinal variation, it was ana-lyzed in more detail than the two others.
Since the coefficient of determination, R2, is a measure ofthe proportion of the total variance in frequencies that isexplained by latitude, we used it as a statistic for “clinality”(Berry and Kreitman 1993). The significance of the R2 valuefor each candidate SNP was obtained empirically by com-parison to the distribution of R2 values from the controlSNPs and enrichment of significant SNPs among the candi-date SNPs was tested for as described previously.
Regression Monte Carlo simulation: To investigate to whatextent a significant clinal variation observed at one nucle-otide site could be due to LD with another SNP, we used theMonte Carlo simulation approach developed by Berry andKreitman (1993) (see also Verrelli and Eanes 2001). Eachsite is in turn considered as a “governing” locus and its effecton other loci assessed by Monte Carlo simulation. Detailsabout this method are given in File S1.
Bayesian generalized linear mixed model (Bayenv) anal-ysis: Another approach to detect clinal variation for SNP allelefrequencies is to account for population history concomitantlywith analysis of environmental correlations, using a Bayesiangeneralized linear mixed model as implemented in theprogram Bayenv (Coop et al. 2010). This approach has re-cently been used successfully in humans (e.g., Hancock et al.2011) and in loblolly pine (Eckert et al. 2010a) and moredetails on its implementation can be found in Hancock et al.(2008, 2010), Coop et al. (2010), and File S1. Briefly,environmental and geographic (e.g., latitude) factors areincorporated as fixed effects, while an estimated variance–covariance matrix of allele frequencies takes into accountrandom effects due to shared population history. A nullmodel of association of SNP frequency with population his-tory is compared to an alternative model that also includesa linear effect of environmental and geographic variables. Asa measure of the support for the correlation between SNPfrequency and the environmental variables, a Bayes factor(BF) is calculated as the ratio of the posterior probabilitiesunder the alternative and the null models. A BF of 3 isconsidered to reflect “substantial” evidence for selectionand a BF.10 indicates a “strong” support (Kass and Raftery1995). However, because the null model is a pure driftmodel, i.e., an unlikely model in most cases, the value ofthe BF can be biased upward, leading to an excess of falsepositives. To overcome this problem, we applied the methodto all control SNPs to obtain an empirical distribution of BFsfrom these SNPs (Coop et al. 2010; Hancock et al. 2010). Wescanned all 445 SNPs and picked the outliers according to thisempirical distribution. The whole procedure was repeated eighttimes using eight different draws of the variance–covariancematrix describing population history. The results were averagedacross the eight runs.
Clinal Variation in Norway Spruce 869
Fst outliers
Since the seminal study of Lewontin and Krakauer (1973),Wright’s fixation index, Fst, has been extensively used todetect recent selection from population genetics data (e.g.,Beaumont and Nichols 1996; Beaumont and Balding 2004;Foll and Gaggiotti 2008). Loci under local selection areexpected to show a larger Fst value than neutral loci, whileloci under balancing selection or purifying selection areexpected to show a lower Fst value. The available outliertests using Fst can be classified into two groups: tests basedon an island model that assumes a common and uniquemigration pool (Beaumont and Nichols 1996; Beaumontand Balding 2004; Foll and Gaggiotti 2008) and tests basedon a hierarchical model that assumes that the populationsamples can be assigned to groups and that the migrationrates are different among demes within groups and betweengroups [model implemented in Arlequin ver. 3.5 (Excoffierand Lischer 2010)]. If the populations are hierarchicallystructured, ignoring it can lead to large number of falsepositives (see Excoffier et al. 2009; Narum and Hess2011). Because an island model seems a reasonable assump-tion in the present case (see the results of STRUCTURE andhierarchical FST analyses), we chose to analyze the data withBayeScan (Foll and Gaggiotti 2008; Fischer et al. 2011),which has proved to be one of the most reliable methodsto detect outliers from the distribution of locus-specific Fst(Pérez-Figueroa et al. 2010; Narum and Hess 2011). Theprogram assumes that allele frequencies follow a multino-mial Dirichlet distribution that arises under a wide range ofdemographic models. In this Bayesian approach, Fst has twocomponents, a population-specific component and a locus-specific component. The alternative model for a given locusis retained when the locus-specific component is necessaryto explain the observed pattern of diversity. To account forthe fact that we have different numbers of candidate andcontrol SNPs, we used a prior odds ratio of 2, an approxi-mate value of the ratio of control to candidate SNPs. Weperformed 20 pilot runs with 5000-length and 50,000-stepMCMC with additional 50,000 steps of burn-in. Bayes fac-tors as well as posterior probabilities were calculated to in-dicate how more likely the model with selection is comparedto the neutral model. Outliers were identified at 5% and 1%significant levels of posterior probability corrected by thefalse discovery rate method (Benjamini and Hochberg1995) as implemented in BayeScan.
Candidate gene expression
As a validation of outliers for environmental correlation withlatitude and/or FST, the expression levels of PaGI, PaPRR7,PaFTL2, and PaCCA1 were assessed in the aforementionedsample of five Swedish populations. These genes wereretained because they either were among the most signifi-cant in the Bayenv and BayeScan analyses (PaGI, PaPRR7,and PaFTL2) or had strong physiological evidence as beinginvolved with bud set [PaCCA1 (U. Lagercrantz, unpub-
lished data)]. Needles were sampled every fourth hour dur-ing the last 48 hr of each photoperiodic interval. Twelveseedlings were randomly chosen in each population andneedles were collected and pooled. RNA was extracted onlyfrom needles sampled during the last 24 hr and on photo-periodic treatments from week 1 to week 6. RNA wasextracted according to manufacturer’s recommendations,using the STRN250 Spectrum Plant Total RNA kit (Sigma-Aldrich). cDNAs were synthesized from 0.5 mg total RNA,using Superscript III reverse transcriptase and random hex-amer primers. Real-time PCR was performed according toGyllenstrand et al. (2007) or on a MyiQ Real-Time PCR De-tection System (Bio-Rad, Hercules, CA) with the followingthermal conditions: 95� for 7 min followed by 40 cycles of95� for 10 sec and 60� for 30 sec (see Table S2 for a detailedprotocol and primer sequences). Each reaction was per-formed in duplicates containing 12 ml DyNAmo Flash SYBRGreen (DyNAmo Flash SYBR Green qPCR kit; Finnzymes,Espoo, Finland), 0.5 mM of each primer, and 4.75 ml cDNA(diluted 1:100) in a total volume of 23.75 ml. Samples wererandomized in each step mentioned above, and a-tubulinwas used as a reference gene. Each sample was amplifiedtwice with sample positions reversed in the plates. To com-pare expression levels, the relative expression level of pop-ulation x under photoperiod i at time-point t is definedas R(xi)(t) ¼ DCTxi(t) 2 DCTSN, where DCT is defined as(CTREFERENCE 2 CTTARGET) and DCTSN is the average DCTvalue under continuous-light treatment of population SE-58.3 (Saleby). To compare the amplitude between differentphotoperiod treatments at time-point t, we calculated thesquared distance from the mean value within each treat-ment for every sample as
DD2ðxiÞ ðtÞ ¼ ðRðxiÞðtÞ2RmeanÞ2;
where R(xi)(t) is the relative expression level and Rmean is themean relative expression level within that photoperiodtreatment. For further analysis, we also repeated the wholeprocedure on the first 24-hr samples under 17.5 hr light/6.5hr dark and 16 hr light/8 hr dark photoperiod treatments forPaFTL2 and PaPRR7. Nonlinear mixed models with har-monic terms were applied to represent the rhythmic expres-sion patterns, using the nlme library in R (Pinheiro et al.2011). The models are described in detail in Albert andHunsberger (2005) and Yang and Su (2010).
Results
Bud set and clinal variation
Bud set was measured after each week of photoperiodictreatment and reflects induction some 2 weeks earlier. Aclear clinal pattern was observed, even when the southern-most population (SE-58.3) is excluded (Figure 2). Under a17.5 hr light/6.5 hr dark photoperiod, no individual set budin the two southernmost populations (SE-58.3 and SE-61.6),�10–20% of individuals set bud in the two intermediate
870 J. Chen et al.
populations (SE-62.7 and SE-64.1), and about half of indi-viduals set bud in the northernmost population SE-66.4. Ifthe critical night length for bud set is defined as the timewhen 50% of plants have set bud, we observe a clinal in-crease of the critical night length as the latitude decreases.
Linkage disequilibrium
We computed a standard estimate of LD (i.e., r2) across allindividual alleles pooled within each population. Only 560of 94,673 pairwise comparisons were significant using a chi-square test with Bonferroni correction (P-value ¼ 5.28E-07),with a total of 157 comparisons with r2 values .0.3. Theaverage level of r2 detected in Norway spruce candidategenes is �0.2 (Figure S3) and is similar to the value ob-served in other spruce species (Beaulieu et al. 2011). Mostof these were intragenic and only a few were among genes.Most of the linkage disequilibrium was found in candidategenes, mainly because we had higher SNP density in thesecompared to control genes, but LD remained limited theretoo.
Genetic structure
Population genetic structure was not pronounced. Theoptimal number of clusters calculated with the DK method(Evanno et al. 2005) is K ¼ 2 (Figure 3A and Figure S4A)and the estimated a . 1.5, indicating that most individualsare admixed. In line with previous studies (Lagercrantz andRyman 1990; Heuertz et al. 2006; Tollefsrud et al. 2008,2009), the main two clusters consist of populations fromthe Baltic–Nordic domain (Russian, Finnish, and northernSwedish populations) and of populations completely orpartly derived from the Alpine domain (GE-47.0 and SE-58.3). The Bryansk (RU-53.3) population that is located incentral Russia (i.e., in one of the putative LGM refugia fromwhich recolonization of northwest Europe started) showeda surprisingly similar clustering pattern to the populations ofcentral Fennoscandia. This was also reflected by a lower FSTvalue between RU-53.3 and the central Fennoscandia clus-ter than with others (Table S3). If we choose the mode ofposterior probability distribution of the data (Pritchard et al.2000) as a criterion, the optimal number of clusters was K ¼3 (Figure 3B and Figure S4B). The new cluster was domi-nated by the four populations from northern Finland thatare located at latitudes higher than 66�N.
There was no pattern of isolation-by-distance among allpopulations (Mantel statistic r ¼ 0.17, P-value ¼ 0.143) oramong populations when the GE-47.0 (Ruhpolding) and SE-58.3 (Saleby) populations were discarded (Mantel statisticr = 20.25, P-value ¼ 0.982). Strong hierarchical structurewas not apparent either. The lowest hierarchical level (level1) was between populations. A second level (level 2)grouped populations into five clusters (GE-47.0, SE-58.3,RU-53.3, central Fennoscandia, and northern Finland) andthe highest level (level 3) considered the Alpine domain andthe Baltic–Nordic domain. The low hierarchical F-statisticvalues between these three levels (Flevel1/total ¼ 0.017,
Fleve2/level1 ¼ 0.035, Flevel3/level2 ¼ 0.05) show that hierarchi-cal genetic structure is negligible and thus applying an islandmodel in the FST outlier analysis should be reasonable.
Analysis of clinal variation in allele frequencies
Using all of the data, including populations from the Alpineand Baltic–Nordic domains, a total of 54 candidate SNPs, of137 (39.4%), and 88 control SNPs, of 308 (28.6%), havea regression slope significantly different from zero at the 5%significance level. To reduce the number of positive SNPssimply due to population structure, we removed the threesouthernmost populations (GE-47.0, RU-53.3, and SE-58.3)and focused on populations located at latitude 61�N andabove. A total of 92 of 445 SNPs, 34 (24.8%) of which werecandidates and 58 (18.8%) of which were controls, showeda significant linear regression with latitude at the 5% signif-icance level (Table 2 and Figure 4). Finally, when we re-moved the four northernmost populations and kept onlypopulations from central Fennoscandia (latitudes rangingfrom 61�N to 65�N), there were only 28 significant SNPsincluding 11 from candidate genes.
The adjusted R2 of regression of SNP frequency on lati-tude was estimated to represent how much of the variancein frequency could be explained by latitude and test forenrichment of the candidate SNPs among significant SNPsat the 5% significance level. In all three data sets there wasno enrichment in candidate SNPs among significant ones.
Observing that there are 92 SNPs in the data set withoutthe three most southern populations whose allele frequen-cies are significantly correlated with latitude and thepresence of LD among candidate SNPs, we tested whetherclinality at one site affects that of flanking SNPs. A MonteCarlo simulation approach was used for this purpose and
Figure 2 Percentage of individuals setting bud in five populations underdifferent photoperiodic treatments. “9.5 darkness 1” and “9.5 darkness2” are week 7 and week 8 within the same photoperiodic treatment (seeMaterials and Methods).
Clinal Variation in Norway Spruce 871
only the 34 significant candidate SNPs were investigated asthey showed higher levels of LD (Figure S5). NonsignificantSNPs can be also affected by nearby variants under selectionbut they were ignored since they do not exhibit clinal vari-ation and were unlikely to explain clinal variation at othersegregating sites. Two broad patterns were apparent. Onepattern involved clinality being driven by LD within andamong genes, such as that observed at PaMFTL1 SNPs. Incontrast, most of the effect of clinality for SNPs in the pro-moter of PaFTL2 came from within the promoter itself, as wasalready suggested by the pattern of LD. The analysis alsoshowed a group of SNPs whose effect could be explainedby a small number of other SNPs such as PaMFTL1_3658,PaFTL2pr_1951, PaPHYP_RIII274, PaGI_F2_9_987, andPaPRR7_F3_104. These SNPs had a higher level of clinalitythan others and usually showed stronger signals in the twoother tests (see results below and Table 2).
Bayesian generalized linear mixed model(Bayenv) analysis
The average BF estimated from the candidate SNPs wassignificantly higher than the one estimated from controlSNPs (average BFs ¼ 3.26 and 1.48, respectively; Wilcoxontwo-sample test P-value ¼ 0.0083). More importantly, therewas a significant enrichment of candidate SNPs in the uppertail of the empirical distribution of BFs. The ratio at the 10%tail (BF $ 2.7) was 0.87 (27/31, candidate to control SNPs)and increased to 1.12 (18/16) in the 5% tail (BF $ 4.6) and
�1.75 (7/4) in the 1% tail (BF $ 10.4) (bootstrap ,1% forall tails, see Table 2). To remove the possible effect of con-trol loci that were potentially under selection, we then ex-cluded 14 control SNPs that were outliers in more than oneof the tests [linear regression, Bayenv, and FST outliers (seebelow)] and reconstructed the null model. This caused anincrease of BF estimates in the upper 25% tail but the can-didate outliers and the enrichment ratio remained the same(0.9, 1.06, and 2 at 10%, 5%, and 1% tails, respectively).
Fst outliers
The average value of FST across SNPs was 0.05. Candidate SNPshad significantly higher FST values than control SNPs (meanFst = 0.056 and 0.047, respectively; Wilcoxon rank sum testP-value ¼ 0.00026). BayeScan identified 29 outliers at the 5%significance level [corrected false discovery rate (FDR) test], 16of which were from candidate loci. At the 1% significance level,11 candidate SNPs and 6 control SNPs were identified as out-liers (Figure 5). Fisher’s exact test indicated a significant enrich-ment of candidate SNPs in these tails (P-values ¼ 0.0064 and0.0054, respectively). At these significance levels, all outliershad high FST values and therefore were putatively under posi-tive selection. The exception was PaKN2b_1157 and two othercontrol SNPs that had low Fst values (�0.020).
Summary of the analysis of SNP variation
We combined the results from all of the analyses above(Figure S6) to obtain a short list of loci with evidence of
Figure 3 Clustering analysis conducted in STRUCTURE. (A) K ¼ 2; (B) K ¼ 3.
872 J. Chen et al.
clinal variation and/or selection (Table 2). Most of the SNPsdisplaying strong signals in all our analyses came from fivegenes: PaMFT1, PaFTL2, PaGI, PaPRR7, and PaPHYP. Oneparticularly interesting SNP was PaFTL2pr_1951, which rep-resents a recent mutation that does not exist in Brewerspruce or in the RU-53.3 (Bryansk) population, both ofwhich served as outgroups, or in populations from the Al-pine domain [GE-47.0 (Ruhpolding) and SE-58.3 (Saleby)].The clinal pattern in allele frequency at PaFTL2pr_1951 can-not be explained by any other SNP, but PaFTL2pr_1951affects the clinal variation at 14 other significant candidateSNPs. Previous studies on expression of FT-Like genes inNorway spruce have shown that the expression level ofPaFTL2 correlates with bud set. Expression is induced by
night lengths exceeding the population-specific critical nightlength for bud set for trees from Romanian and Arctic pop-ulations, respectively (Gyllenstrand et al. 2007). The single-nucleotide mutation in the PaFTL2 promoter might affectthe divergent expression patterns in genotypes with varyingcritical night lengths. Other strong outlier SNPs were lo-cated in putative circadian clock genes (PaPRR7 and PaGI)that are likely regulators of PaFTL2 expression and bud set(A. Karlgren, N. Gyllenstrand, and U. Lagercrantz, unpub-lished data). PaPRR7_F2_534 is a nonsynonymous substitu-tion causing an amino acid replacement from serine tothreonine (TCT to ACT in nucleotides) and was found tobe associated with two other SNPs, PaPRR7_F1_771 andPaPRR7_F3_104 (in intron and 39-UTR regions, respectively).
Table 2 Summary of the SNPs detected by the different approaches
BayeScan Annotation
SNP Clinality: R2MC sampling: no.
SNPs affecting the SNP Affected by Log10(P.O.) FST Bayenv BF AA change Nucleotide
PaPHYN_RIII185 0.256 25 7 20.360 0.043 0.710PaPHYN_RIII88 0.510 0 12 20.427 0.051 4.276PaPHYO_RIII510 0.213 33 5 20.353 0.054 1.938PaPHYP_RIII274 0.404 0 14 2.743 0.139** 25.175**PaPHYP_RIII76 0.408 1 10 20.409 0.045 1.879PaMFTL1_1050 0.316 23 9 2.467 0.129** 5.489*PaMFTL1_2091 0.383 3 11 2.013 0.127** 11.692**PaMFTL1_2136 — — — 0.126 0.072 5.813*PaMFTL1_2215 0.285 31 9 20.492 0.048 1.263PaMFTL1_3441 0.359 18 7 1.788 0.122** 12.817**PaMFTL1_3658 0.555 0 12 0.333 0.080 6.414*PaMFTL1_802 — — — 1.469 0.127* 6.540*PaFTL2pr_1560 0.419 6 9 20.417 0.050 0.617PaFTL2pr_1824 0.493 0 15 20.347 0.053 3.300PaFTL2pr_1951 0.641 0 14 1.540 0.121** 18.346**PaFTL2pr_2173 0.459 6 13 0.850 0.101* 5.254 *PaFTL2pr_2509 0.441 2 13 1.915 0.133** 16.667**PaFTL2pr_2694 0.387 7 13 1.159 0.108* 5.752*PaFTL2pr_2790 0.396 11 12 1.284 0.112* 4.963*PaGI_F2_9_1470 0.543 0 7 0.596 0.090 2.752 His to Tyr CAT to TATPaGI_F2_9_987 0.479 1 8 1000 0.170** 117.407** His to Tyr CAT to TATPaGI_F6_8_1111 0.244 1 6 20.061 0.036 0.627PaHB3_385 0.349 0 8 20.116 0.036 0.582PaKN2a_253 0.240 33 6 20.411 0.043 0.557PaKN2b_1157 — — — 0.702 0.020* 0.592PaKN3a_211 0.377 0 12 20.521 0.045 0.873PaKN4b_242 0.251 33 8 0.415 0.088 47.522**PaCCA1_1302 0.288 22 7 20.381 0.043 0.533PaPRR1_1992 — — — 20.529 0.047 4.884*PaPRR1_2990 — — — 20.503 0.048 4.871*PaPRR1_3301 — — — 20.511 0.048 4.721*PaPRR3_F1_2570 0.380 0 7 20.319 0.055 3.244PaPRR3_F1_2978 — — — 1.722 0.123** 2.568PaPRR3_F2_331 0.309 25 8 20.504 0.048 4.213PaPRR3_F2_481 0.354 1 8 20.369 0.055 3.009PaPRR7_F1_1505 0.455 0 11 20.237 0.060 2.965PaPRR7_F1_771 0.683 2 11 2.467 0.126** 3.842PaPRR7_F2_417 0.230 33 7 20.374 0.053 1.768PaPRR7_F2_534 0.673 2 11 1000 0.138** 4.458 Ser to Thr TCT to ACTPaPRR7_F3_104 0.689 2 11 1000 0.161** 9.906*PaZTL_793 0.352 7 7 20.343 0.041 0.677
See the main text for details on the different approaches. *P, 0.05; **P, 0.01. Log10(P.O.) (P.O., posterior odds): 0.5–1, substantial evidence; 1–1.5, strong evidence; 1.5–2, very strong; .2, decisive evidence. Bayes factors: 3.2–10, substantial evidence; 10–100, strong evidence; .100, decisive evidence.
Clinal Variation in Norway Spruce 873
PaGI_F2_9_987 and PaGI_F2_9_1470 are linked and bothcause an amino acid replacement from histidine to tyrosine(CAT to TAT in nucleotides). Using results from the NCBIBLASTP 2.2.25+ tool (Altschul et al. 1997), histidine islikely the conserved state, as all 64 hits from 47 species withhigh similarity (e-value ,6e-95) had a histidine in thisposition.
Candidate gene expression
For a subset of candidate genes, we compared the relativeexpression levels among photoperiodic treatments using thepairwise Wilcoxon rank sum test implemented in R (R De-velopment Core Team 2011). The mean expression level ofPaFTL2 across all populations increased continuously as thenight length increased from near 0 in constant light to 2.7 atnight length of 3.5 hr and to 6.0 at night length of 8 hr. Inthe two southernmost populations (SE-58.3 and SE-61.6),PaFTL2 was significantly upregulated at night length of 6.5hr compared to the expression level under constant light(P-value ,0.01, FDR corrected). In the two northernmostpopulations (SE-64.1 and SE-66.4), this change happened
for a night length of 5 hr (P-values ,0.01 and ,0.05, re-spectively). This change might happen even earlier in SE-66.4 as the P-values are 0.052 for night lengths of 2 hr and3.5 hr. In population SE-62.7, there was a significant in-crease in expression of PaFTL2 already at 2 hr night length,which is earlier than expected on the basis of previousresults. Except for SE-58.3, all populations showed a secondincrease in expression level when the night length reached8 hr compared to the first increase in expression level(P-value,0.05 for SE-62.7 and,0.001 for the other three).We therefore compared the relative expression level ofPaFTL2 among populations under a night length of 6.5 hror 8 hr. There is a significant difference (P-value ,0.001)in PaFTL2 expression between SE-58.3 (3.76) and otherpopulations (5.14 on average) for a night length of 6.5 hr(Figure 6). Furthermore, when the night length is 8 hr, themean expression levels of the five populations are signifi-cantly differentiated (SE-58.3, 4.11; SE-61.6, 6.22; SE-62.7,6.20; SE-64.1, 6.70; and SE-66.4, 6.73; P-value ,0.05) andlinearly correlated with latitude (P-value ¼ 0.048, adjustedR2 ¼ 0.75, Figure 7).
Figure 4 Examples of candidateSNPs that show a significant re-gression of the transformed allelefrequency on latitude.
874 J. Chen et al.
Unlike PaFTL2, the expression levels of PaCCA1, PaGI,and PaPRR7 were similar across photoperiod treatments,and differences were not statistically significant. These genesare presumably located in the circadian clock (PaCCA1, PaGI,and PaPRR7) or related to it (PaFTL2) and their expressionlevel is expected to show a diurnal expression rhythm. Tomeasure the amplitude of their rhythm, we calculated thesquared distance from the mean level of relative expressionin each photoperiod treatment. There was an increase of am-plitude in the 16 hr light/8 hr dark photoperiod in all fivepopulations in PaPRR7 (P-value ,0.05) compared to that inconstant light. Furthermore, we represented their cycletrends using harmonic regression with one or two periods.The predicted curves of PaPRR7 and PaGI showed nice di-urnal rhythms with a periodicity of �24 hr (see Figure 6and Table S4 for all estimated parameters), while neitherPaFLT2 nor PaCCA1 fit the autoregressive model well andshowed no clear diurnal rhythms.
Discussion
In this study, we have used different approaches to identifySNPs in candidate genes that are potentially associated tothe clinal variation observed for bud set in Norway spruce.The combined approaches identified a promising set ofSNPs, in particular in the promoter of the PaFTL2 gene. Thelatter was furthermore accompanied by clinal variation inexpression level. Below we discuss some caveats of the dif-ferent approaches, how they relate to each other, and howtheir joint use can shed light on the genetic control of budset and local adaptation. In particular, we discuss the effectof population histories on the interpretation of clinal varia-tion. Finally, we briefly outline the implications of the pres-ent results for the design of future association-mappingstudies for traits showing strong clinal variation and givesome suggestions for future extensions of the current study.
Ascertainment bias
The SNPs analyzed here were discovered using threedifferent discovery panels. All candidate SNPs originated
from a large discovery panel (�48 chromosomes), while thebulk of the control SNPs come from smaller discovery panels(12 chromosomes in the CRSP resequencing panel and 24chromosomes in the Arborea genotyping panel). Albrecht-sen et al. (2010) reported that the size and the geographicdistribution of the discovery panel could lead to a bias in thesite-frequency spectrum. They also showed that as long asthe discovery panel size is not too small (,4 chromosomes),estimates of Fst are rather robust to ascertainment bias sincethe bias has a similar effect on the total- and among-populationvariances. In our data, we observed a deficit of low-frequencyalleles in the control loci (mean allele frequencies ¼ 0.33and 0.24 for control and candidate SNPs, respectively,P-value ,2.2e-16 in two-samples t-test). As expected, thecontrol SNPs have significantly higher heterozygosity thancandidate SNPs (mean heterozygosities ¼ 0.29 and 0.25 forcontrol and candidate SNPs, respectively, P-value ¼ 0.0238).FST values among populations are, however, of the sameorder of magnitude as estimates obtained through analysisof sequence data (mean Fst’s ¼ 0.056 and 0.047, for candi-date and control, respectively, and 0.040 between northernand southern Sweden in Heuertz et al. 2006) although itshould be pointed out that the two studies have many SNPsin common.
The control SNPs were also used to build a covariancematrix between populations for the null model used byBayenv. The covariance matrix can be viewed as a model-based, population-specific estimate of Fst (Nicholson et al.2002; Coop et al. 2010). Hence, in both the Bayenv and theBayeScan analyses, the results should not have been af-fected too strongly by ascertainment bias, and selectionshould indeed be the major cause of the substantial differ-ence in Fst estimates between control and candidate SNPs.This is supported by the analysis of the site-frequency spec-trum for control SNPs and candidate SNPs in the total data(Figure S7A) and for the significant SNPs detected in thethree main analyses: clinal variation (Figure S7B), Bayenvanalysis (Figure S7C), and the BayeScan analysis (FigureS7D). The site-frequency spectrum for each of those setsof SNPs did not show any specific structure, and there was
Figure 5 Analysis of FST outliersusing BayeScan (Foll and Gag-giotti 2008). Estimates of FST forcandidate SNPs (black circles) andbackground SNPs (white circles)are plotted against the logarithmof the posterior odds.
Clinal Variation in Norway Spruce 875
no clear overrepresentation of the candidate SNPs in thelowest allele-frequency classes.
Population history, selection from standing variation,and local selection
A major issue when trying to identify SNPs associated tolocal adaptation is to account for demographic processesthat can leave the same signature (Väsemegi 2006; DeCarvalho et al. 2010; Savolainen et al. 2011). Norwayspruce, as well as many other plant and animal species,recolonized northern Europe after the LGM along at leasttwo main recolonization routes (Giesecke and Bennett2004; Tollefsrud et al. 2009). Thus, the current northernEuropean spruce forests were established during a rathershort evolutionary period and from different sources. Thisis indeed what our data suggest. There are three main ge-netic clusters, but with the exception of the population fromGermany and the four populations from northern Finland,no populations can be assigned entirely to any of these threeclusters. In the case of population SE-58.3, the presence ofa group of individuals exhibiting a close similarity to indi-viduals from the German population likely reflects the use ofseeds imported from Germany and Belorussia in afforesta-tion over the last century. Apart from Germany and the fourpopulations from northern Finland, the pattern in popula-tions ranging from 60�Ν to 65�N suggests a modest contri-
bution from the Alpine domain (the blue cluster in Figure3), as well as more substantial contributions from two east-ern LGM refugia (green and red clusters in Figure 3). One ofthese two refugia, the one associated with populations fromnorthern Finland, was probably located at a higher latitudethan the other, a possibility suggested by recent paleogeo-graphic studies (Väliranta et al. 2011). The green clustercould also reflect the survival of trees in Scandinavia or closeto it during the LGM (Öberg and Kullman 2011), but if thosescattered refugia had contributed to present-day popula-tions, the similarity and the genetic closeness of the Russianpopulation (RU-53.3) to populations from central Fenno-scandia (Figure 3B and Table S3) might be difficult to ex-plain since it would run counter to the general direction ofrecolonization inferred from both genetic (Lagercrantz andRyman 1990) and paleoecological studies (Giesecke andBennett 2004).
Independently of the exact geographical origin of thesource populations, the spread of populations together withthe existence of diverse origins could lead to serial foundereffects and secondary contact, which in turn may result inneutral polymorphisms showing clines in allele frequencies(Novembre and Di Rienzo 2009). Two recent studies haveinvestigated the possibility that secondary contacts and ad-mixture could play a role in local adaptation in Scandinaviantree species. In Scots pine (P. sylvestris), Savolainen et al.
Figure 6 Mean level per population of the relative RNA expression of four putative photoperiodic genes under different photoperiodic treatments.Autoregressive models were used to obtain the curves, except for PaCCA1 for which the fit was poor and different points were just joined by a line.(A–D) At the bottom, white rectangles represent light periods and black rectangles represent dark periods. (A) PaPRR7; (B) PaGI; (C) PaFTL2; (D) PaCCA1.
876 J. Chen et al.
(2011) detected no admixture zone and speculated thatadaptation occurred during the colonization process fromstanding variation. De Carvalho et al. (2010) did detectadmixture in European aspen (Populus tremula) from cen-tral Sweden and argued that admixture facilitated adapta-tion from standing variation. Both studies differed from oursin major ways. The inference of population structure inP. sylvestris was based on only three populations and a lim-ited number of markers. Therefore, it may not have hadthe power to detect a putative admixture zone. The studyof De Carvalho et al. (2010) was based on seven popula-tions, only one of them located in Scandinavia and theothers being far from it. Admixture in European aspen wasinferred in this single Swedish population located at 63�Ν,possibly reinforcing the impression of selection from stand-ing variation.
In Norway spruce, while we did not observe any evidenceof isolation-by-distance (Mantel test, P-value.0.95), we diddetect evidence of secondary contacts and admixture. To tellapart selection from demography we used a three-step ap-proach. First, we used simple regression analysis of allelefrequencies on latitude for differing subsets of the data.The number of SNPs showing clinal variation decreasedsharply when populations from northern Finland were notconsidered, suggesting that part of the clinal variationreflected population structure. Since we did not correct fordemography in these regression analyses, no enrichment incandidate genes was observed in the tails when the adjustedR2 statistics were considered. This simply means that de-mography is, on the average, affecting all SNPs equallyacross the genome, irrespective of the their status (candi-date vs. noncandidate), and that demography needs to beaccounted for if one wants to identify SNPs also affected by
selection. Second, since population structure is not ac-counted for in the regression analysis, we used a Bayesiangeneralized linear mixed model (Bayenv) on all populations(Hancock et al. 2008, 2010, 2011; Coop et al. 2010). Thisled us to discard many SNPs that had high R2 in the standardcline analyses, suggesting that the strong clinal variation atthose SNPs likely reflects population structure. However,since we also observed an enrichment of candidate SNPswhen population structure was accounted for, it wouldtherefore seem that the data also reflect selection, possiblyfrom standing variation, rather than from new mutations.The latter is supported by the fact that most of the allelesthat show significant SNPs are already present in the Rus-sian population, which is close to the putative ancestral re-fugia. One of the most interesting SNPs, however, isPaFTL2pr_1951, which was absent in Brewer spruce in theRussian population and in the German population andtherefore appears to be a new mutation. Third, we testedfor FST outliers to assess whether the enrichment in candi-date SNPs that we detected with Bayenv was due to selec-tion in past environments or since the populations colonizedScandinavia. In contrast to results in European aspen (Maet al. 2010), we did detect significant SNPs when we used anFST-outlier approach, suggesting the presence of someamount of local selection. Furthermore, there was a goodcorrespondence between the Bayenv and the FST-outlierapproaches, with 12 SNPs picked by the two methods and4 and 6 identified uniquely by Bayenv and the FST-outlierapproach, respectively. Ma et al. (2010) argued that exten-sive long-distance gene flow in wind-pollinated forest trees(e.g., Robledo-Arnuncio 2011) will reduce population ge-netic structure and will also overwhelm selection and limitour ability to discover selection signal by erasing peaks ofgenetic differentiation (Ma et al. 2010). In our study, localselection seems to have been strong enough to generateextreme Fst values at certain loci so that they emerge asoutliers in the analysis and, furthermore, we also observea significant enrichment of candidate SNPs among the sig-nificant ones. Prunier et al. (2011) obtained similar resultsin black spruce.
Differentiation at quantitative trait loci
Some authors (Latta 1998; Le Corre and Kremer 2003;Kremer and Le Corre 2011) have argued that low valuesof FST will be observed even at loci underlying quantitativetraits and that this reflects a decoupling between the poly-morphism of a quantitative trait and the polymorphism of itsunderlying quantitative trait loci. For instance, in the modelof Kremer and Le Corre (2011), natural selection first actson beneficial allelic associations at different loci in differentpopulations and only significantly affects allelic frequenciesat individual loci at later time points. The high FST valuesobtained for significant outlier SNPs (the mean FST of out-liers is 0.13 if the only value that was in the lower tail of thedistribution is removed, Table 2) are also in contrast toresults presented by Kremer and Le Corre (2011), where
Figure 7 Correlation of the mean level of expression of PaFTL2 undera night length of 8 hr with latitude (P-value = 0.048, adjusted R2 = 0.75).
Clinal Variation in Norway Spruce 877
candidate SNPs and control SNPs exhibited similar values ofFST. We believe that this is simply due to the fact that mostof the candidate SNPs used in Kremer and Le Corre (2011)may actually not be associated to the studied quantitativetraits. Nonetheless, as in Namroud et al. (2008), Kremer andLe Corre (2011), and Prunier et al. (2011), FST values atcandidate gene SNPs were still much smaller than QST
values.
Expression patterns of candidate genes
Most of our putative candidate genes were chosen as re-presentatives of the photoperiodic pathway and include pu-tative photoreceptors, circadian clock genes, and genesacting downstream of these genes. Even though the functionof most of these candidate genes in Norway spruce is stillpoorly characterized, many of the significant SNPs werelocated in genes that are logical candidates on the basis ofcurrent knowledge. One such example is the set of signif-icant SNPs located in the PaPHYP gene. Physiological experi-ments have shown that genotypes from high latitudes do notrespond to exposure to red light during an artificial longnight and thus set buds, in contrast to genotypes from moresouthern latitudes (Clapham et al. 1998; Gyllenstrand et al.2007). On the basis of sequence homology, PaPHYP corre-sponds to PhyB in angiosperms, which is the receptor that ispredominantly responsible for responsiveness to red light(Schneider-Poetsch et al. 1998), and clinal variation at PhyPSNPs was observed in Scots pine (Garcia-Gil et al. 2003).Significant SNPs were also identified in putative circadianclock genes, in particular in PaGI and PaPRR7. Our unpub-lished data support that these genes are indeed part of theNorway spruce circadian clock and furthermore show diver-gent expression patterns in genotypes from extreme northernand southern latitudes, in particular, divergent responses tonight breaks (A. Karlgren, N. Gyllenstrand, and U. Lagercrantz,unpublished data). These results also agree well with studiesin Sitka spruce and poplar where this category of genes wasfound to exhibit clinal variation and be associated with bud set(Holliday et al. 2010; Ma et al. 2010; Hall et al. 2011; Hsuet al. 2011).
If enhanced genetic structure in these candidate genes istruly caused by selection, we might expect a divergence ingene expression levels among alleles. As the circadian clockis composed of multiple integrated feedback loops, muta-tions in both regulatory and coding parts could affect thetranscriptional patterns of multiple clock genes. When ana-lyzing gene expression we used a bulk-sampling strategy, witheach population being represented by 12 randomly chosenindividuals. Thus, the expression level should reflect theeffect of each allele weighted by its frequency. This couldundermine the estimation of the true expression divergencebetween the two alleles, as the effect of a slight allelefrequency shift caused by weak selection may not bedetected. In this study, the expression levels of PaPRR7,PaCCA1, and PaGI show very subtle differentiation alongthe latitudinal gradient. However, this marginal difference
might still cause significant expression changes in down-stream genes such as PaFTL2 and divergent bud set andgrowth cessation along the latitude in phenotype.
Our previous studies have shown that PaFTL2 expressionis controlled by photoperiod and is correlated to bud setunder various experimental conditions. Additionally, PaFTL2expression differs between genotypes from extreme north-ern and southern latitudes (Gyllenstrand et al. 2007). In thepresent study, we found a weak but significant latitudinalcline in the expression of PaFTL2 correlated to bud set ina more limited part of the latitudinal range. In light of thesedata, the identification of several significant SNPs in thepromoter of PaFTL2 is very encouraging, especially as someare located in or close to putatively important binding sites.Analyses using “Plant promoter analysis navigator” [Plant-PAN (Chang et al. 2008)] and “RNAhybrid” (Krüger andRehmsmeier 2006) indicate that the PaFTL2pr_1951 SNPis located in the 39 end of a putative miRNA-target bindingsite of a member of the miR169 and/or miR399 families,which are important transcriptional regulators in plants(reviewed by Bartel 2004), and is located next to the bind-ing site of an “E2F Consensus Element” transcript factor thatmight cause an affinity change.
Implications
One of the main aims of the statistical methods developedfor association studies is to decrease the number of falsepositives due to population structure. This, however, couldlead to false negatives if population history and differencesin allele frequency at causal SNPs covary as may be the casein the present study (Atwell et al. 2010; Eckert et al. 2010b).As pointed out by Atwell et al. (2010, p. 630), “plants fromnorthern Sweden will tend to share cold-adaptive alleles atmany causal loci as a result of selection, and marker allelesgenome-wide as a result of demographic history.” Atwellet al. (2010) were referring to A. thaliana but our resultsand those of De Carvalho et al. (2010) suggest that this mayalso fit well to Norway spruce and European aspen. In bothspecies, the strong latitudinal clines observed at many SNPsmay reflect the presence of different glacial refugia as wellas different selection pressures.
In the present study, we have found evidence of clinalvariation at SNPs in candidate genes, using populationsfrom a limited part of the Norway spruce natural range. Wealso observed that PaFTL2 expression varied along the lat-itudinal cline. Demonstrating that the identified SNPs aretruly causal for bud set, however, will remain a dauntingtask in a nonmodel species like Norway spruce. Ralph andCoop (2010) showed that parallel adaptation can occurwithin the same species, something that is of major interestin species like Norway spruce that have a fairly large globaleffective population size and a distribution that may allowfor several clines to be established. Independent clines havebeen extensively used to study patterns of adaptation inDrosophila, and this seems to be a fruitful approach fordetecting loci underlying local adaptation (e.g., Oakeshott
878 J. Chen et al.
et al. 1982; Turner et al. 2010). To validate the SNPs andloci identified in the present study, it would be highly in-teresting to test whether clinal variation at the loci identifiedhere is also observed in other regions of the same specieswith a different history or in other species.
Acknowledgments
We thank Kerstin Jeppsson for help in the laboratory; DavidClapham for help with plant management; and V. L.Semerikov, Y. N. Isakov, and the staff at the Swedish ForestResearch Institute (Skogforsk) and at the Finnish ForestResearch Institute (Metsäntutkimuslaitos) for seed samples.We also thank G. Daoust (Canadian Forest Service) for sup-plying Arborea’s Norway spruce SNP discovery trees. Thisresearch was supported by the European Community’s SixthFramework Programme, under the Network of ExcellenceEvoltree; by the Seventh Framework Programme (FP7/2007–2013), under grant agreement 211868 (Project Novel-tree); and by the Swedish Research Council (Forskningsrådetför miljö, areella näringar och samhällsbyggande) grant dnr217-2007-433 (to M.L.). Part of the gene expression experi-ment was supported by the Nilsson–Ehle foundation.
Literature Cited
Aitken, S. N., S. Yeaman, J. A. Holliday, T. Wang, and S. Curtis-McLane, 2008 Adaptation, migration or extirpation: climatechange outcomes for tree populations. Evol. Appl. 1: 95–111.
Albani, M. C., and G. Coupland, 2010 Comparative analysis offlowering in annual and perennial plants. Curr. Top. Dev. Biol.91: 323–348.
Albert, P. S., and S. Hunsberger, 2005 On analyzing circadianrhythms data using nonlinear mixed models with harmonicterms. Biometrics 61: 1115–1120.
Alberto, F., L. Bouffier, J. Louvet, J. Lamy, S. Delzon et al.,2011 Adaptive responses for seed and leaf phenology in nat-ural populations of sessile oak along an altitudinal gradient. J.Evol. Biol. 24: 1442–1454.
Albrechtsen, A., F. C. Nielsen, and R. Nielsen, 2010 Ascer-tainment biases in SNP chips affect measures of population di-vergence. Mol. Biol. Evol. 27: 2534–2547.
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhanget al., 1997 Gapped BLAST and PSI-BLAST: a new generationof protein database search programs. Nucleic Acids Res. 25:3389–3402.
Atwell, S., Y. S. Huang, B. J. Vilhjálmsson, G. Willems, M. Hortonet al., 2010 Genome-wide association study of 107 phenotypesin Arabidopsis thaliana inbred lines. Nature 465: 627–631.
Bartel, D. P., 2004 MicroRNAs: genomics, biogenesis, mechanism,and function. Cell 116: 281–297.
Beaulieu, J., T. Doerksen, B. Boyle, S. Clément, M. Deslaurierset al., 2011 Association genetics of wood physical traits inthe conifer white spruce and relationships with gene expression.Genetics 188: 197–214.
Beaumont, M. A., and D. J. Balding, 2004 Identifying adaptivegenetic divergence among populations from genome scans.Mol. Ecol. 13: 969–980.
Beaumont, M. A., and R. A. Nichols, 1996 Evaluating loci for usein the genetic analysis of population structure. Proc. Biol. Sci.263: 1619–1626.
Benjamini, Y., and Y. Hochberg, 1995 Controlling the false dis-covery rate: a practical and powerful approach to multiple test-ing. J. R. Stat. Soc. B 57: 289–300.
Bergelson, J., and F. Roux, 2010 Towards identifying genes un-derlying ecologically relevant traits in Arabidopsis thaliana. Nat.Rev. Genet. 11: 867–879.
Berry, A., and M. Kreitman, 1993 Molecular analysis of an allo-zyme cline: Alcohol-Dehydrogenase in Drosophila melanogasteron the East coast of North-America. Genetics 134: 869–893.
Binney, H. A., K. J. Willis, M. E. Edwards, S. A. Bhagwat, P. M.Anderson et al., 2009 The distribution of late-Quaternarywoody taxa in northern Eurasia: evidence from a new macro-fossil database. Quat. Sci. Rev. 28: 2445–2464.
Bresson, C. C., Y. Vitasse, A. Kremer, and S. Delzon, 2011 To whatextent is altitudinal variation of functional traits driven by ge-netic adaptation in European oak and beech? Tree Physiol. 31:1164–1174.
Chang, W. C., T. Y. Lee, H. D. Huang, H. Y. Huang, and R. L. Pan,2008 PlantPAN: Plant Promoter Analysis Navigator, for identi-fying combinatorial cis-regulatory elements with distance con-straint in plant gene group. BMC Genomics 9: 561.
Clapham, D., I. Dormling, I. Ekberg, G. Eriksson, M. Qamaruddinet al., 1998 Latitudinal cline of requirement for far-red lightfor the photoperiodic control of bud set and extension growth inPicea abies (Norway spruce). Physiol. Plant 102: 71–78.
Coop, G., D. Witonsky, A. Di Rienzo, and J. K. Pritchard,2010 Using environmental correlations to identify loci under-lying local adaptation. Genetics 185: 1411–1423.
De Carvalho, D., P. K. Ingvarsson, J. Joseph, L. Suter, C. Sedivyet al., 2010 Admixture facilitates adaptation from standingvariation in the European aspen (Populus tremula L.), a wide-spread forest tree. Mol. Ecol. 19: 1638–1650.
Dormling, I., 1973 Photoperiodic control of growth and growthcessation in Norway spruce seedlings. IUFRO Division 2, Work-ing Party 2.01.4, Growth Processes, Symposium on Dormancy inTrees. Kornik, Poland, September 5–9, pp. 1–16.
Eckert, A. J., A. D. Bower, S. C. Gonzalez-Martinez, J. L. Wegrzyn,G. Coop et al., 2010a Back to nature: ecological genomics ofloblolly pine (Pinus taeda, Pinaceae). Mol. Ecol. 19: 3789–3805.
Eckert, A. J., J. van Heerwaarden, J. L. Wegrzyn, C. D. Nelson, J.Ross-Ibarra et al., 2010b Patterns of population structure andenvironmental associations to aridity across the range of loblollypine (Pinus taeda L., Pinaceae). Genetics 185: 969–982.
Evanno, G., S. Regnaut, and J. Goudet, 2005 Detecting the num-ber of clusters of individuals using the software STRUCTURE:a simulation study. Mol. Ecol. 14: 2611–2620.
Excoffier, L., T. Hofer, and M. Foll, 2009 Detecting loci underselection in a hierarchically structured population. Heredity103: 285–298.
Excoffier, L., and H. E. L. Lischer, 2010 Arlequin suite ver. 3.5:a new series of programs to perform population genetics anal-yses under Linux and Windows. Mol. Ecol. Res. 10: 564–567.
Fischer, M. C., M. Foll, L. Excoffier, and G. Heckel, 2011 EnhancedAFLP genome scans detect local adaptation in high-altitude pop-ulations of a small rodent (Microtus arvalis). Mol. Ecol. 20:1450–1462.
Foll, M., and O. Gaggiotti, 2008 A genome-scan method to iden-tify selected loci appropriate for both dominant and codominantmarkers: a Bayesian perspective. Genetics 180: 977–993.
Fornara, F., A. de Montaigu, and G. Coupland, 2010 SnapShot:control of flowering in Arabidopsis. Cell 141: 550e2.
Garcia-Gil, M. R., M. Mikkonen, and O. Savolainen, 2003 Nu-cleotide diversity at two phytochrome loci along a latitudinalcline in Pinus sylvestris. Mol. Ecol. 12: 1195–1206.
Giesecke, T., and K. D. Bennett, 2004 The Holocene spread ofPicea abies (L.) Karst in Fennosandia and adjacent areas. J. Bio-geogr. 31: 1523–1548.
Clinal Variation in Norway Spruce 879
Goudet, J., 2005 HIERFSTAT, a package for R to compute and testhierarchical F -statistics. Mol. Ecol. Notes 2: 184–186.
Gyllenstrand, N., D. Clapham, T. Källman, and U. Lagercrantz,2007 A Norway spruce FLOWERING LOCUS T homolog is im-plicated in control of growth rhythm in conifers. Plant Physiol.144: 248–257.
Hall, D., X.-F. Ma, and P. K. Ingvarsson, 2011 Adaptive evolutionof the Populus tremula photoperiod pathway. Mol. Ecol. 20:1463–1474.
Hancock, A. M., D. B. Witonsky, A. S. Gordon, G. Eshel, J. K. Pritchardet al., 2008 Adaptations to climate in candidate genes for com-mon metabolic disorders. PLoS Genet. 4: e32.
Hancock, A. M., G. Alkorta-Aranburu, D. B. Witonsky, and A. DiRienzo, 2010 Adaptations to new environments in humans:the role of subtle allele frequency shifts. Philos. Trans. R. Soc.Lond. B 365: 2459–2468.
Hancock, A. M., D. B. Witonsky, G. Alkorta-Aranburu, C. M. Beall,A. Gebremedhin et al., 2011 Adaptations to climate-mediatedselective pressures in humans. PLoS Genet. 7: e1001375.
Heuertz, M., E. De Paoli, T. Källman, H. Larsson, I. Jurman et al.,2006 Multilocus patterns of nucleotide diversity, linkage dis-equilibrium and demographic history of Norway spruce [Piceaabies (L.) Karst]. Genetics 174: 2095–2105.
Hill, W. G., and A. Robertson, 1966 The effect of linkage on limitsto artificial selection. Genet. Res. 8: 269–294.
Holliday, J. A., K. Ritland, and S. N. Aitken, 2010 Widespread,ecologically relevant genetic markers developed from associa-tion mapping of climate-related traits in Sitka spruce (Piceasitchensis). New Phytol. 188: 501–514.
Hsu, C.-Y., J. P. Adams, H. Kim, K. No, C. Ma et al.,2011 FLOWERING LOCUS T duplication coordinates repro-ductive and vegetative growth in perennial poplar. Proc. Natl.Acad. Sci. USA 108: 10756–10761.
Hubisz, M. J., D. Falush, M. Stephens, and J. K. Pritchard,2009 Inferring weak population structure with the assistanceof sample group information. Mol. Ecol. Res. 9: 1322–1332.
Ingvarsson, P. K., M. V. Garcia, V. Luquez, D. Hall, and S. Jansson,2008 Nucleotide polymorphism and phenotypic associationswithin and around the phytochrome B2 locus in European aspen(Populus tremula, Salicaceae). Genetics 178: 2217–2226.
Jaarola, M., H. Tegelström, and K. Fredga, 1999 Colonizationhistory in Fennoscandian rodents. Biol. J. Linn. Soc. Lond. 68:113–127.
Jaramillo-Correa, J. P., J. Beaulieu, and J. Bousquet, 2001 Con-trasting evolutionary forces driving population structure atexpressed sequence tag polymorphisms, allozymes, and quanti-tative traits in white spruce. Mol. Ecol. 10: 2729–2740.
Karlgren, A., N. Gyllenstrand, T. Källman, J. F. Sundström, D.Moore et al., 2011 Evolution of the PEBP gene family in plants:functional diversification in seed plant evolution. Plant Physiol.156: 1967–1977.
Kass, R. E., and A. E. Raftery, 1995 Bayes factors. J. Am. Stat.Assoc. 90: 773–795.
Kremer, A., and V. Le Corre, 2011 Decoupling of differentiationbetween traits and their underlying genes in response to diver-gent selection. Heredity 108: 375–385.
Kremer, A., V. Le Corre, R. J. Petit, and A. Ducousso, 2010 Historicaland contemporary dynamics of adaptive differentiation in Europeanoaks, pp. 101–122 in Molecular Approaches in Natural ResourceConservation, edited by A. DeWoody, J. Bickham, C. Michler, K.Nichols, G. Rhodes, and K. Woeste. Cambridge University Press,Cambridge; London; New York.
Krüger, J., and M. Rehmsmeier, 2006 RNAhybrid: microRNA targetprediction easy, fast and flexible. Nucleic Acids Res. 34: 451–454.
Lagercrantz, U., and N. Ryman, 1990 Genetic structure of NorwaySpruce (Picea abies): concordance of morphological and allozy-mic variation. Evolution 44: 38–53.
Latta, R. G., 1998 Differentiation of allelic frequencies at quanti-tative trait loci affecting locally adaptive traits. Am. Nat. 151:283–292.
Le Corre, V., and A. Kremer, 2003 Genetic variability at neutralmarkers, quantitative trait loci and trait in a subdivided popu-lation under selection. Genetics 164: 1205–1219.
Lewontin, R. C., and J. Krakauer, 1973 Distribution of gene fre-quency as a test of the theory of the selective neutrality of poly-morphisms. Genetics 74: 175–195.
Li, J., H. Li, M. Jakobsson, S. Li, P. Sjödin et al., 2012 Joint esti-mation of demography and selection: Where do we stand andwhere could we go? Mol. Ecol. 21: 28–44.
Lobo, N. L., 2011 Clinal variation at putatively adaptive polymor-phisms in mature populations of Sitka spruce (Picea sitchensis(Bong.) Carr.). M.Sc. Thesis, University of British Columbia,Vancouver, BC, Canada.
Ma, X. F., D. Hall, K. St. Onge, S. Jansson, and P. K. Ingvarsson,2010 Genetic differentiation, clinal variation and phenotypicassociations with growth cessation across the Populus tremulaphotoperiodic pathway. Genetics 186: 1033–1044.
Namroud, M.-C., J. Beaulieu, N. Juge, J. Laroche, and J. Bousquet,2008 Scanning the genome for gene SNPs involved in adaptivepopulation differentiation in white spruce. Mol. Ecol. 17: 3599–3613.
Narum, S. R., and J. E. Hess, 2011 Comparison of FST outlier testsfor SNP loci under selection. Mol. Ecol. Res. 11: 184–194.
Nicholson, G., A. Smith, F. Jónsson, O. Gústafsson, K. Stefánssonet al., 2002 Assessing population differentiation and isolationfrom single-nucleotide polymorphism data. J. R. Stat. Soc. B 64:695–715.
Novembre, J., and A. Di Rienzo, 2009 Spatial patterns of variationdue to natural selection in humans. Nat. Rev. Genet. 10: 745–755.
Oakeshott, J. G., J. B. Gibson, P. R. Anderson, W. R. Knibb, D. G.Anderson et al., 1982 Alcohol-dehydrogenase and glycerol-3-Phosphate dehydrogenase clines in Drosophila melanogaster ondifferent continents. Evolution 36: 86–96.
Öberg, L., and L. Kullman, 2011 Ancient subalpine clonal spruces(Picea abies): source of postglacial vegetation history in theSwedish Scandes. Arctic 64: 183–196.
Oksanen, J., F.G. Blanchet, R. Kindt, P. Legendre, R. B. O’Hara,et al., 2011 Vegan: Community Ecology Package. R packageversion 1.17-10. http://vegan.r-forge.r-project.org.
Pelgas, B., J. Bousquet, P. G. Meirmans, K. Ritland, and N. Isabel,2011 QTL mapping in white spruce: gene maps and genomicregions underlying adaptive traits across pedigrees, years andenvironments. BMC Genomics 12: 145.
Pérez-Figueroa, A., M. J. Garcia-Pereira, M. Saura, E. Rolan-Alvarez,and A. Caballero, 2010 Comparing three different methods todetect selective loci using dominant markers. J. Evol. Biol. 23:2267–2276.
Pinheiro, J., D. Bates, S. DebRoy, D. Sarkar, and the R DevelopmentCore Team 2011 nlme: Linear and Nonlinear Mixed EffectsModels. R package version 3.1-102. http://cran.r-project.org/web/packages/nlme/index.html.
Pritchard, J. K., M. Stephens, and P. Donnelly, 2000 Inference ofpopulation structure using multilocus genotype data. Genetics155: 945–959.
Pritchard, J. K., J. K. Pickrell, and G. Coop, 2010 The genetics ofhuman adaptation: hard sweeps, soft sweeps, and polygenicadaptation. Curr. Biol. 20: R208–R215.
Prunier, J., J. Laroche, J. Beaulieu, and J. Bousquet, 2011 Scann-ing the genome for gene SNPs related to climate adaptation andestimating selection at the molecular level in boreal blackspruce. Mol. Ecol. 20: 1702–1716.
R Development Core Team, 2011 R: A Language and Environmentfor Statistical Computing. R Foundation for Statistical Comput-ing, Vienna. Available at http://www.R-project.org/.
880 J. Chen et al.
Ralph, P., and G. Coop, 2010 Parallel adaptation: One or manywaves of advance of an advantageous allele? Genetics 186:647–668.
Robledo-Arnuncio, J. J., 2011 Wind pollination over mesoscaledistances: an investigation with Scots pine. New Phytol. 190:222–233.
Rousset, F., 1997 Genetic differentiation and estimation of geneflow from F-statistics under isolation by distance. Genetics 145:1219–1228.
Savolainen, O., T. Pyhäjärvi, and T. Knürr, 2007 Gene flow and localadaptation in trees. Annu. Rev. Ecol. Evol. Syst. 38: 595–619.
Savolainen, O., S. T. Kujala, C. Sokol, T. Pyhäjärvi, K. Avia et al.,2011 Adaptive potential of northernmost tree populations toclimate change, with emphasis on Scots pine (Pinus sylvestrisL.). J. Hered. 102: 526–536.
Schneider-Poetsch, H. A. W., . Kolukisaogu, D. H. Clapham, J.Hughes, and T. Lamparter, 1998 Non-angiosperm phyto-chromes and the evolution of vascular plants. Physiol. Plant.102: 612–622.
Seppä, H., T. Alenius, R. H. W. Bradshaw, T. Giesecke, M. Heikkiläet al., 2009 Invasion of Norway spruce (Picea abies) and the riseof the boreal ecosystem in Fennoscandia. J. Ecol. 97: 629–640.
Tollefsrud, M. M., R. Kissling, F. Gugerli, Ø. Johnsen, T. Skrøppaet al., 2008 Genetic consequences of glacial survival and post-glacial colonization in Norway spruce: combined analysis ofmitochondrial DNA and fossil pollen. Mol. Ecol. 17: 4134–4150.
Tollefsrud, M. M., J. H. Sonstebo, C. Brochmann, O. Johnsen, T.Skroppa et al., 2009 Combined analysis of nuclear and mito-chondrial markers provide new insight into the genetic structureof North European (Picea abies). Heredity 102: 549–562.
Turner, T. L., E. C. Bourne, E. L. von Wettberg, T. T. Hu, and S. V.Nuzhdin, 2010 Population resequencing reveals local adapta-tion of Arabidopsis lyrata to serpentine soils. Nat. Genet. 42:260–264.
Väliranta, M., A. Kaakinen, P. Kuhry, S. Kultti, J. S. Salonen et al.,2011 Scattered late-glacial and early Holocene tree popula-tions as dispersal nuclei for forest development in north-easternEuropean Russia. J. Biogeogr. 38: 922–932.
Väsemegi, A., 2006 The adaptive hypothesis of clinal variationrevisited: single-locus clines as a result of spatially restrictedgene flow. Genetics 173: 2411–2414.
Verrelli, B. C., and W. F. Eanes, 2001 Clinal variation for aminoacid polymorphisms at the Pgm locus in Drosophila mela-nogaster. Genetics 157: 1649–1663.
Yang, R., and Z. Su, 2010 Analyzing circadian expression data byharmonic regression based on autoregressive spectral estima-tion. Bioinformatics 26: i168–i174.
Yeaman, S., and S. P. Otto, 2011 Establishment and maintenanceof adaptive genetic divergence under migration, selection, anddrift. Evolution 65: 2123–2129.
Communicating editor: M. A. Beaumont
Clinal Variation in Norway Spruce 881
GENETICSSupporting Information
http://www.genetics.org/content/suppl/2012/04/27/genetics.112.140749.DC1
Disentangling the Roles of History and LocalSelection in Shaping Clinal Variation of AlleleFrequencies and Gene Expression in Norway
Spruce (Picea abies)Jun Chen, Thomas Källman, Xiaofei Ma, Niclas Gyllenstrand, Giusi Zaina, Michele Morgante,
Jean Bousquet, Andrew Eckert, Jill Wegrzyn, David Neale, Ulf Lagercrantz, and Martin Lascoux
Copyright © 2012 by the Genetics Society of AmericaDOI: 10.1534/genetics.112.140749
J. Chen et al. 2 SI
File S1
Supporting Text
Details on the implementation of the Berry and Kreitman (1993) approach.
The method works as follows. Each site is in turn considered as a “governing” locus and its effect on other loci
assessed by simulation. First the observed allele frequency at the “governing” locus in each population is kept
constant and simulates the allele frequencies of affected locus using binomial sampling around the expected
values. For example, consider two sites X (governing) and Y (target). We hold the frequency of allele A at site X
constant in each population and recalculate the frequency of allele B at site Y. Suppose that we have a pooled
data set of 500 individuals. The overall frequency of allele B at site Y is 50% and allele B was observed in 20% of
the individuals carrying allele A at site X and in 30% of the individuals carrying another allele. If, in population i,
the allele A is present in 30 out of 50 individuals at site X, then the expected frequency of allele B at site Y in
population i can be calculated as (0.2 × 30 + 0.3 × 20) / 50 = 0.24. We then perform a binomial sampling around
this expected allele frequency for population i and repeat it for each population and each target locus to obtain
1000 simulated data sets. These 1000 simulated frequencies data sets for each population result in 1000 R2
values of the linear regression of allele frequencies on population latitude at the target loci based on the
observed allele frequency at the governing locus. After 1000 iterations, the R2 values are sorted to obtain a
confidence interval for the expected R2 distribution. If the observed R2 falls within the 95% confidence interval,
the observed geographic variation at the target locus is simply explained by its linkage disequilibrium with the
governing locus.
Bayesian generalized linear mixed model analysis (Coop et al. 2010).
COOP et al. (2010) developed a Bayesian generalized linear mixed model to test the correlation between SNP
variation and environmental variables while accounting for population structure. This approach has recently
been used successfully in humans (e.g. HANCOCK et al. 2011) and in loblolly pine (ECKERT et al. 2010). In COOP et al.
model, environmental factors, including geographic parameters, such as latitude, are incorporated as fixed
effects while an estimated variance-‐covariance matrix takes into account all random effects due to shared
population history. In its assumption, the population frequencies randomly deviate from the ancestral frequency
(εl) due to genetic drift, and the populations should co-‐vary in their deviation because of shared ancestry and
gene flow. One also assumes that the transformed allele frequency at the lth SNP follows a multivariate normal
distribution and the null model is defined as: P (θl | Ω, εl) ~ N (εl, εl (1-‐εl) Ω),where θl is the transformed allele
frequency across populations, and Ω is the covariance matrix that is shared by all loci. We performed 4 runs with
100 million steps of MCMC each in order to make sure that Ω is convergent and reflects the true population
structure.
The alternative model includes the correlation between allele frequency at each SNP and the population latitude
or other environmental variables (Y) by adding a fixed effect to the mean of the multivariate normal distribution:
P (θl | Ω, εl) ~ N (εl + βY, εl (1-‐εl) Ω), where β is the linear effect of the environmental variable Y. As a measure of
the support for the correlation a Bayes factor (B.F.) is calculated as the ratio of the posterior probabilities under
the alternative and the null models. A single draw of Ω estimates under the null model across all SNPs was used
for each run of correlation test. We scanned all 445 SNPs and picked the outliers according to the empirical
distribution generated by sorting the BF to calculate a significant level for each SNP. The whole procedure was
J. Chen et al. 3 SI
repeated 8 times using 8 different draws of Ω. The results were averaged across the 8 runs. More details on the
model can be found in COOP et al. (2010), HANCOCK et al. (2010a) and HANCOCK et al. (2010b).
J. Chen et al. 4 SI
Own resequencing e!orts CRSP Arborea
768 SNP Golden Gate Assay
mean of 42 haplotypes mean of 12 haplotypes mean of 24 haplotypes
137 SNP from 18 candidat genes (I)
308 SNP from 264 control genes
I II III
(II, III)
Megagametophytes from 18 populations belonging to three main clusters
Analysis of SNP clinal variation on total data and subsets
Analysis of correlation (SNP freq. , Lat) accounting for pop. StructureSelection
Local selection
Set of top candidate SNPs Five Swedish populationsLat. 55N-67N
FTL2, PRR7, CCAL1, GI Analysis clinal variation in budsetand gene expression
FTL2
Analysis of Fst outliers on total dataset
E!ect of structure
Dissecting SNP clinal variation
Figure S1 The general outline of this study.
J. Chen et al. 5 SI
Figure S2 Photoperiodic treatments and sampling schedule for phenotype and RNA expression studies
J. Chen et al. 6 SI
a b
c Figure S3 Pairwise linkage disequilibrium estimated using r2. a. Baltico-‐Nordic domain populations b. German population Ruhpolding c. Southern Swedish population Saleby
J. Chen et al. 7 SI
Figure S4 The optimal K choose using a. log posterior probabilities and b. ΔK
J. Chen et al. 8 SI
Monte Carlo Sampling at Significant Clinal Candidate Loci
PaMFTL1_1050
PaMFTL1_2091
PaMFTL1_2215
PaMFTL1_3441
PaMFTL1_3658
PaFTL2pr_1560
PaFTL2pr_1824
PaFTL2pr_1951
PaFTL2pr_2173
PaFTL2pr_2509
PaFTL2pr_2694
PaFTL2pr_2790
PaGI_F2_9_1470
PaGI_F2_9_987
PaGI_F6_8_1111
PaHB3_385
PaKN2a_253
PaKN3a_211
PaKN4b_242
PaCCA1-Like_1302
PaPRR3_F1_2570
PaPRR3_F2_331
PaPRR3_F2_481
PaPRR7_F1_1505
PaPRR7_F1_771
PaPRR7_F2_417
PaPRR7_F2_534
PaPRR7_F3_104
PaZTL_793
PaPHYN_RIII185
PaPHYN_RIII88
PaPHYO_RIII510
PaPHYP_RIII274
PaPHYP_RIII76
Ta
rge
t L
oci
PaMFTL1_1050
PaMFTL1_2091
PaMFTL1_2215
PaMFTL1_3441
PaMFTL1_3658
PaFTL2pr_1560
PaFTL2pr_1824
PaFTL2pr_1951
PaFTL2pr_2173
PaFTL2pr_2509
PaFTL2pr_2694
PaFTL2pr_2790
PaGI_F2_9_1470
PaGI_F2_9_987
PaGI_F6_8_1111
PaHB3_385
PaKN2a_253
PaKN3a_211
PaKN4b_242
PaCCA1.Like_1302
PaPRR3_F1_2570
PaPRR3_F2_331
PaPRR3_F2_481
PaPRR7_F1_1505
PaPRR7_F1_771
PaPRR7_F2_417
PaPRR7_F2_534
PaPRR7_F3_104
PaZTL_793
PaPHYN_RIII185
PaPHYN_RIII88
PaPHYO_RIII510
PaPHYP_RIII274
PaPHYP_RIII76
Governing Loci
Figure S5 Summary of the Monte Carlo sampling for the site-‐by-‐site analysis. Only candidate SNPs that exhibited a significant regression with latitude are shown. Shaded boxes represent the significant clinal variation at the target loci can be explained by geographic variant at the governing loci.
J. Chen et al. 9 SI
Figure S6 A Venn diagram summarizing the results from the three different tests applied to the data. The figures are ratios of candidate to control SNPs in the different tests and combination of tests (at 5% significant level). The numbers in non-‐overlapping areas are the results obtained when only one method was applied while the numbers in the overlapping areas are the results when combining the results from the different methods (see Table 2 in text for details regarding the number of outliers).
J. Chen et al. 10 SI
candidate
control
Total Data Set
a
allele frequency
proportion
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0-0.05
0.05-0.1
0.1-0.15
0.15-0.2
0.2-0.25
0.25-0.3
0.3-0.35
0.35-0.4
0.4-0.45
0.45-0.5
candidate
control
Clinal Variation
b
allele frequency
proportion
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0-0.05
0.05-0.1
0.1-0.15
0.15-0.2
0.2-0.25
0.25-0.3
0.3-0.35
0.35-0.4
0.4-0.45
0.45-0.5
candidate
control
Bayevn 5% Outliers
c
allele frequency
proportion
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0-0.05
0.05-0.1
0.1-0.15
0.15-0.2
0.2-0.25
0.25-0.3
0.3-0.35
0.35-0.4
0.4-0.45
0.45-0.5
candidate
control
BayeScan 5% Outliers
d
allele frequency
proportion
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0-0.05
0.05-0.1
0.1-0.15
0.15-0.2
0.2-0.25
0.25-0.3
0.3-0.35
0.35-0.4
0.4-0.45
0.45-0.5
Figure S7 Allele frequency spectrum of candidate (dark) and control (grey) loci. a. Total SNP data set used in this study. b. SNPs shown significant clinal variation analysis. c. Outlier SNPs at 5% tail of B.F. distribution from Bayesian generalized linear mixed model. d. Outlier SNPs at 5% tail of log(P.O.) distribution from BayeScan analysis
J. Chen et al. 11 SI
Table S1 Description of the candidate SNPs and control genes used in this study Table S1 is available for download at http://www.genetics.org/content/suppl/2012/04/27/genetics.112.140749.DC1 as an Excel file.
J. Chen et al. 12 SI
Table S2 cDNA synthesis: The tubes containing Mix1 (Table S2-‐1, RNA integrity and concentration was measured using both QIAxcel and NanoDrop machines) were incubated in 70 °C for 10 minutes and then kept on ice. Mix2 (Table S2-‐2) was added to each tube to a total volume of 16 µl / tube and incubated overnight in 42 °C. Real-‐Time PCR: Real-‐Time PCR mix (Table S2-‐3) was denatured in 96°C for 7 minutes followed by 40 cycles of 95°C for 10 seconds and 60°C for 30 seconds. Table S2-‐1 cDNA synthesis Mix1
RNA: 0.5 µg
Random primers (0.3 µg/µl, Invitrogen) 1.0 µl
RNase free water add to 9.0 µl / reaction
Table S2-‐2 cDNA synthesis Mix2
FSB (5X) 3.0 µl
DTT (0.1 mM) 1.5 µl
dNTP Mix (10 mM) 1.0 µl
Super Script III Reverse Transcriptase 0.5 µl
Total sum 6.0 µl
Table S2-‐3 Real-‐Time PCR mix
Sybr Green 12 µl
Primer_F (10 µM) 1.2 µl
Primer_R (10 µM) 1.2 µl
water 4.6 µl
cDNA (diluted 1/100) 4.8 µl
Table S2-‐4 Real-‐Time PCR primer sequences
Primer Name Sequence
PaMyb2_RT_new_I_Fwd ACGAAGCCATGCTCAGAAGT
PaMyb2_RT_new_I_Rev GGGATAAGGATGCCTTGGTT
PaPRR7_RT_Fwd CCGTCAGAAAAGGAAAGAGAGAT
PaPRR7_RT_Rev GTCACCCTTGGTCGTTGTTC
PaFTL2_RT_Fwd CGATGTTGGAGGAGACGAC
PaFTL2_RT_Rev GCATAAGCTCCCGTCCAAA
PaGI_II_RT_Fwd GGCAGAAGGGCTATGGAA
PaGI_II_RT_Rev ATGTACTAAGTGCGCGGACA
Alpha-‐Tubulin_Fwd GGCATACCGGCAGCTCTTC
Alpha-‐Tubulin_Rev AAGTTGTTGCCGGCGTCTT
J. Chen et al. 13 SI
Table S3 FST values between outlying populations and populations from Scandinavia
GE-‐47.0 RU-‐53.3 SE-‐58.3 Central Scandinavia Northern Scandinavia
GE-‐47.0 0
RU-‐53.3 0.09841 0
SE-‐58.3 0.01574 0.05008 0
Central Scandinavia 0.09324 0.02924 0.03486 0
Northern Scandinavia 0.21465 0.09164 0.13825 0.04971 0
J. Chen et al. 14 SI
Table S4 Parameters of the expression study
Population Gene Photoperiod num_period(n) Period(hours) Frequency Mean Amplitude1 Phase1 Amplitude2 Phase2 P value Adjusted R2
Saleby PaPRR7 6.5-‐h dark 2 24.0883029 0.9963342 0.2572301 -‐1.171126 0.3764416 -‐0.8575932 0.2843278 1.08E-‐05 0.8539805
Istevallen PaPRR7 6.5-‐h dark 2 24.0883029 0.9963342 0.3794638 -‐1.179671 0.3468512 -‐0.8575984 0.2875196 3.20E-‐08 0.9540118
Hogmansbod PaPRR7 6.5-‐h dark 2 24.0883029 0.9963342 0.4407237 -‐1.183953 0.3320218 -‐0.857601 0.2891152 3.08E-‐06 0.8860118
Jamtland PaPRR7 6.5-‐h dark 2 24.0883029 0.9963342 0.3763569 -‐1.179453 0.3476044 -‐0.8575982 0.2874372 8.24E-‐05 0.7819624
Norrland PaPRR7 6.5-‐h dark 2 24.0883029 0.9963342 0.3814307 -‐1.179809 0.3463717 -‐0.8575985 0.2875698 1.34E-‐05 0.8475043
Saleby PaPRR7 8-‐h dark 2 24.46223596 0.9811041 -‐0.106884847 -‐1.904577 0.288863 -‐1.252201 0.2351993 2.02E-‐10 0.9832656
Istevallen PaPRR7 8-‐h dark 2 24.46223347 0.9811042 -‐0.065075785 -‐2.054817 0.2553925 -‐1.49951 0.2244521 3.29E-‐08 0.9537639
Hogmansbod PaPRR7 8-‐h dark 2 24.46223347 0.9811042 -‐0.120554783 -‐1.935065 0.2641766 -‐1.377222 0.2315329 2.23E-‐07 0.9322808
Jamtland PaPRR7 8-‐h dark 2 24.46223596 0.9811041 0.19784703 -‐1.90541 0.2839268 -‐1.162195 0.2278678 8.95E-‐09 0.9643297
Norrland PaPRR7 8-‐h dark 2 24.46223347 0.9811042 0.009672734 -‐2.132017 0.2690722 -‐1.492159 0.2211576 1.41E-‐08 0.9609403
Saleby PaFTL2 6.5-‐h dark 2 32.72120592 0.7334693 3.936021 1.10196 0.6157213 0.4071825 0.61158736 0.030203166 0.3279688
Istevallen PaFTL2 6.5-‐h dark 2 18.10357918 1.3257047 5.140415 0.8976922 0.3116078 0.5583806 -‐0.04082232 0.013036978 0.4235358
Hogmansbod PaFTL2 6.5-‐h dark 2 18.28547048 1.3125175 5.113656 0.9022274 0.3183256 0.5550212 -‐0.02630423 0.080531553 0.201715
Jamtland PaFTL2 6.5-‐h dark 2 17.29425475 1.3877441 5.266614 0.8762897 0.2797363 0.574224 -‐0.109184 0.006354538 0.495844
Norrland PaFTL2 6.5-‐h dark 2 17.64706142 1.3599998 5.210176 0.8858511 0.2939395 0.5671448 -‐0.07861946 0.010067361 0.4505291
Saleby PaFTL2 8-‐h dark 2 24.77478686 0.9687268 4.27484 0.6838827 0.07255829 1.0828521 0.1768407 0.003396988 5.52E-‐01
Istevallen PaFTL2 8-‐h dark 2 24.77478686 0.9687268 6.157942 0.4909119 0.08499913 0.6476207 0.1405572 2.92438E-‐05 8.22E-‐01
Hogmansbod PaFTL2 8-‐h dark 2 24.77478686 0.9687268 6.171123 0.4840324 0.08173737 0.6302373 0.1427557 0.001806262 6.03E-‐01
Jamtland PaFTL2 8-‐h dark 2 24.77478686 0.9687268 6.706634 0.3957059 0.06491306 0.4196627 0.1473204 0.170061665 0.09732673
Norrland PaFTL2 8-‐h dark 2 24.77478686 0.9687268 6.622679 0.4225862 0.07545146 0.4864659 0.1408149 0.010812493 0.44318458
Saleby PaGI 6.5-‐h dark 1 31.50571317 0.7617666 0.2253251 -‐1.2264843 0.4083669 0 0 6.09E-‐04 0.9499638
Istevallen PaGI 6.5-‐h dark 1 29.56801864 0.8116878 0.2930902 -‐0.9799559 0.3509365 0 0 0.001747929 0.9156402
Hogmansbod PaGI 6.5-‐h dark 1 32.45252669 0.7395418 0.1951597 -‐1.3362269 0.4339268 0 0 0.002272174 0.9039764
J. Chen et al. 15 SI
Jamtland PaGI 6.5-‐h dark 1 32.48286427 0.7388511 0.1942244 -‐1.3396267 0.4347236 0 0 0.011108401 0.7911934
Norrland PaGI 6.5-‐h dark 1 30.46918353 0.7876811 0.2605084 -‐1.0984848 0.3785437 0 0 0.039517145 0.6173051
Saleby PaGI 8-‐h dark 1 24 1 0.5344578 -‐1.426126 0.1689911 0 0 0.00099348 9.36E-‐01
Istevallen PaGI 8-‐h dark 1 24 1 0.6460503 -‐1.294352 0.157305 0 0 0.022429406 7.07E-‐01
Hogmansbod PaGI 8-‐h dark 1 24 1 0.5815179 -‐1.366374 0.1630477 0 0 0.001191218 0.9302158
Jamtland PaGI 8-‐h dark 1 24 1 1.0786341 -‐1.04714 0.1753259 0 0 0.003227804 0.8858474
Norrland PaGI 8-‐h dark 1 24 1 0.8727394 -‐1.21923 0.1798153 0 0 0.008237298 0.8194167