population genetics of maize domestication, adaptation, and improvement

54
Maize Evolution J. Ross-Ibarra Introduction Domestication Origins Diversity Adaptation Parallel Introgression Improvement US germplasm Iowa RRS Deleterious Conclusions Population genetics of maize domestication, adaptation, and improvement Jerey Ross-Ibarra www.rilab.org @jrossibarra rossibarra February 5, 2014

Upload: jrossibarra

Post on 14-Dec-2014

1.117 views

Category:

Education


1 download

DESCRIPTION

The domestication of maize ~10,000 years ago resulted in dramatic differentiation from its wild ancestor teosinte. Subsequently, maize spread rapidly across the Americas, adapting to a number of new environments. Beginning in the 20th century, maize has also been subjected to intensive artificial selection by breeders. Each of these periods of adaptation have left their mark on patterns of genetic diversity. I will discuss some of our recent work using population genetics to learn about the history and process of adaptation in maize.

TRANSCRIPT

Page 1: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Population genetics of maize domestication,adaptation, and improvement

Je↵rey Ross-Ibarrawww.rilab.org

@jrossibarrarossibarra

February 5, 2014

Page 2: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Lead Authors

UC Davis

Sofiane Mezmouk

Joost van Heerwaarden

Matthew Hu↵ord

Shohei Takuno

U Missouri

Justin Gerke

U Copenhagen

Rute Fonseca

Page 3: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Acknowledgements

People

•Ed Buckler (USDA)

• Jer-Ming Chia (CSHL)

•John Doebley (Wisconsin)

• Jode Edwards (USDA)

•Tom Gilbert (Copenhagen)

•Mike McMullen (USDA)

• Tanja Pyhajarvi

• Lauren Sagara

• Nathan Springer (Minnesota)

•Doreen Ware (USDA)

Funding

Page 4: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Maize evolutionary genetics

Page 5: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Maize evolutionary genetics

Page 6: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Maize evolutionary genetics

Div

ersi

ty

Genome Sequence

Selective Sweep

Page 7: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Outline

1 DomesticationGeographic origins of maize domesticationImpacts of selection on genomic diversity

2 AdaptationParallel adaptation to new environmentsAdaptive introgression from wild relatives

3 ImprovementHistorical genomics of US maizeDrift and selection in the Iowa RRSThe role of deleterious alleles in maize

4 Conclusions

Page 8: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Maize origins: single domestication

Sawers & Sanchez Leon 2011 Front. Genet.

Matsuoka et al. 2002 PNAS

• Single domestication from lowland ssp. parviglumis

• Microsatellite data suggested oldest maize from highlands

Page 9: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Maize origins: single domestication

Sawers & Sanchez Leon 2011 Front. Genet.

Matsuoka et al. 2002 PNAS

• Single domestication from lowland ssp. parviglumis

• Microsatellite data suggested oldest maize from highlands

Page 10: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Highland landraces genetically similar to teosinte

ResultsPatterns of Genetic Structure and Differentiation. Principal com-ponents analysis (PCA) (17) of the maize SNP data identifies 58significant principal components (PCs) (explaining 37.6% oftotal variance), probably reflecting isolation by distance (18) andlinkage effects (19). We use the first nine PCs, which present thestrongest spatial autocorrelation (Fig. S2) and explain a largeportion of the total variance (18.7%), to cluster the accessionsinto 10 geographically distinct groups (Fig. 1A). Meso-Americanmaize falls into three groups: the Meso-American Lowlandgroup, which includes predominantly lowland accessions fromsoutheast Mexico and the Caribbean; the West Mexico group,representing both lowlands and highlands; and the MexicanHighland group, encompassing most of Matsuoka et al.’s high-land Mexican accessions (5) as well as accessions from highlandGuatemala. These clusters also confirm the presence of US-de-rived varieties in South America (20); we excluded these acces-sions from further analysis.In the joint PCA analysis of the three subspecies, the first PC

(10.8% of variance) separates maize from its wild relatives andconfirms the similarity between maize from the Mexican Highlandgroup and parviglumis (Fig. 1B). The second PC (4.8%of variance)mainly separates the genetic groups of maize along a north–southaxis, with the Northern United States and Andean Highlands atthe extremes. The third PC (2.7% of variance) predominatelyreflects the difference between parviglumis and mexicana. TheMexican Highland cluster extends toward mexicana along bothPC 1 and 3, suggesting that the similarity of highland maize toparviglumis may reflect admixture with mexicana.

Admixture Analysis. Simulation of gene flow of mexicana into theMeso-American Lowland maize group suggests that 13% cu-mulative historical introgression is sufficient to explain observeddifferences between lowland and highland maize in terms ofheterozygosity and differentiation from parviglumis (Fig. S3).Structure analysis (21) of all Mexican accessions lends supportfor this magnitude of introgression (Fig. 2). The three subspeciesform clearly separated clusters, but evidence of admixture is

evident in all three groups, and the two wild relatives show clearsigns of bidirectional introgression at altitudes where theirranges overlap (Fig. 2). Highland maize shows strong signs ofmexicana introgression, with 20% admixture observed in theMexican Highland cluster, but below 1,500 m mexicana in-trogression drops to less than 1%. Introgression from parviglumisinto maize is much lower overall, reaching its highest averagevalue (3%) in the lowland West Mexico group.

Drift Analysis. Because introgression from mexicana may affectancestry inference based on genetic distance from parviglumis, wetook an approach that does not require reference to the wild rel-atives. Under models of historical range expansion, genetic dif-ferentiation increases away from the population of origin (22, 23),and estimates of drift from ancestral frequencies have been appliedsuccessfully to identify ancestral populations (24). We thereforeapplied the method of Nicholson et al. (25) to estimate simulta-neously ancestral frequencies and F, a measure of genetic drift ofaway from these frequencies, for sets of predefined populations.To illustrate the potential impact ofmexicana introgression, we

first performed a standard analysis that includes each maizepopulation in turn in conjunction with the two wild relatives.Average drift away from the inferred common ancestor of maize,parviglumis, and mexicana is higher for maize (F = 0.24) than formexicana (F = 0.15) or parviglumis (F = 0.07), probably due tochanges in allele frequency following the domestication bottle-neck. Because the inferred ancestral frequencies are closer tothose of the wild relatives than to present-day maize, comparisonwith this ancestor is sensitive to introgression from these sub-species. It therefore is not surprising that estimates of F betweenindividual maize populations and the common ancestor of allthree taxa identify the Mexican Highland group as being mostsimilar (Fig. 3A). This pattern is maintained in an analysis ex-cluding mexicana, in which Mexican Highland maize is tied withtheWestMexico group as themost ancestral population (Fig. 3B).To mitigate the impact of introgression, we used a slightly

modified approach that excludes both parviglumis and mexicanaand calculates genetic drift with respect to ancestral frequenciesinferred from domesticated maize alone. Because the genetic

Fig. 1. (A) Map of sampled maize accessions colored by genetic group. (B) First three genetic PCs of all sampled accessions.

van Heerwaarden et al. PNAS | January 18, 2011 | vol. 108 | no. 3 | 1089

EVOLU

TION

• 1K SNPs from 1200 landraces across Americas

• PCA identifies genetic clusters and confirms highlandmaize most similar to teosinte

van Heerwaarden et al. 2011 PNAS

Page 11: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Modern maize originated in lowlands

similarity of some of our maize groups violates the assumption ofindependent drift, we infer ancestral frequencies by averagingover estimates obtained for pairs of diverged maize groups andcalculate drift of individual populations with respect to thesefrequencies. In contrast to previous results, this comparisonidentifies the West Mexico group as being most similar to thecommon domesticated ancestor, followed by the MexicanHighland and Meso-American Lowland groups (Fig. 3C).Moreover, splitting the West Mexico group into highland(>2,000 m) and lowland (<1,500 m) components reveals that thelowland West Mexico group is most similar to the inferred an-cestral maize. Direct comparison of genetic drift among thelowland West Mexico, Mexican Highland, and each of theremaining eight clusters shows further that the lowland WestMexico group is significantly closer than the Mexican Highlandgroup to the inferred ancestor of each triplet (Fig. S4). Theseresults strongly suggest that maize from the western lowlands ofMexico is genetically most similar to the common ancestor ofmaize and is more closely related to other extant populationsthan is maize from the highlands of central Mexico.The ancestral position of the lowland West Mexico group is

confirmed in a spatially explicit analysis of current allele fre-quencies in modern landraces, in which we mapped the momentestimator of F with respect to inferred ancestral allele frequen-cies. Mapping against allele frequencies observed in parviglumis

(Fig. 4A) recapitulates earlier genetic results identifying highlandmaize as most similar to its wild ancestor (5). Points in the lower0.05 quantile of F cluster in the highlands, with a mean altitudeof 1,745 m. In contrast, mapping F with respect to inferred an-cestral allele frequencies (Fig. 4B) identifies the lowest 0.05quantile of F values in the lowlands of western Mexico, includingthe Balsas region and the region south of the Mexican highlands,resulting in an average altitude of 1,268 m; this analysis alsoclearly estimates higher values of F for maize in the Mexicanhighlands, particularly in areas of high inferred introgressionfrom mexicana (Fig S5).

DiscussionResolving the origins and spread of domesticated crops is a fas-cinating and challenging endeavor that requires the integrationof botanical, archeological, and genetic evidence (26, 27, 28).Maize provides an exceptional opportunity for studying theprocesses of domestication and subsequent diffusion because ofthe wealth of existing archaeobotanical data, germplasm acces-sions, and molecular markers. The contradiction between evi-dence supporting the earliest cultivation in the lowlands and thegenetically ancestral position of Mexican Highland maize istherefore of particular interest. The disagreement is important,because the adaptive differences between highland and lowlandmaize are profound (14, 29). In other crops, uncertainty about

mexicana parviglumis Meso-American Lowland West Mexico Mex. Highland

05001000150020002500

m

Fig. 2. (Lower) Bar plot of assignment values for the sample of Mexican accessions: Mexicana (red), parviglumis (green), and mays (blue). (Upper) The solidblack line indicates the altitude for each sample. The dotted line marks the minimum altitude at which mexicana occurs.

0.1 0.2 0.3 0.4

F F

0.1 0.2 0.3 0.40.1 0.2 0.3 0.4

F

South-West USCentral USS American Lowland Bolivian LowlandMeso-American Lowland West MexicoCoastal BrazilMexican Highland North USAndean Highland

A B C

Fig. 3. Posterior densities of the genetic drift parameter F for 10 genetic groups with respect to (A)mexicana and (B) parviglumis. Only lowland accessions ofthe West Mexico group (light blue) were included. (C) Drift of all 10 genetic groups with respect to inferred ancestral frequencies. Light blue represents WestMexico; dotted line indicates the division between lowlands (<1,500 m, solid line) and highlands (>2,000 m).

1090 | www.pnas.org/cgi/doi/10.1073/pnas.1013011108 van Heerwaarden et al.

• Identifying gene flow from ssp. mexicana

• Ancestral reconstruction identifies lowland origin

van Heerwaarden et al. 2011 PNAS

Page 12: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Modern maize originated in lowlands

similarity of some of our maize groups violates the assumption ofindependent drift, we infer ancestral frequencies by averagingover estimates obtained for pairs of diverged maize groups andcalculate drift of individual populations with respect to thesefrequencies. In contrast to previous results, this comparisonidentifies the West Mexico group as being most similar to thecommon domesticated ancestor, followed by the MexicanHighland and Meso-American Lowland groups (Fig. 3C).Moreover, splitting the West Mexico group into highland(>2,000 m) and lowland (<1,500 m) components reveals that thelowland West Mexico group is most similar to the inferred an-cestral maize. Direct comparison of genetic drift among thelowland West Mexico, Mexican Highland, and each of theremaining eight clusters shows further that the lowland WestMexico group is significantly closer than the Mexican Highlandgroup to the inferred ancestor of each triplet (Fig. S4). Theseresults strongly suggest that maize from the western lowlands ofMexico is genetically most similar to the common ancestor ofmaize and is more closely related to other extant populationsthan is maize from the highlands of central Mexico.The ancestral position of the lowland West Mexico group is

confirmed in a spatially explicit analysis of current allele fre-quencies in modern landraces, in which we mapped the momentestimator of F with respect to inferred ancestral allele frequen-cies. Mapping against allele frequencies observed in parviglumis

(Fig. 4A) recapitulates earlier genetic results identifying highlandmaize as most similar to its wild ancestor (5). Points in the lower0.05 quantile of F cluster in the highlands, with a mean altitudeof 1,745 m. In contrast, mapping F with respect to inferred an-cestral allele frequencies (Fig. 4B) identifies the lowest 0.05quantile of F values in the lowlands of western Mexico, includingthe Balsas region and the region south of the Mexican highlands,resulting in an average altitude of 1,268 m; this analysis alsoclearly estimates higher values of F for maize in the Mexicanhighlands, particularly in areas of high inferred introgressionfrom mexicana (Fig S5).

DiscussionResolving the origins and spread of domesticated crops is a fas-cinating and challenging endeavor that requires the integrationof botanical, archeological, and genetic evidence (26, 27, 28).Maize provides an exceptional opportunity for studying theprocesses of domestication and subsequent diffusion because ofthe wealth of existing archaeobotanical data, germplasm acces-sions, and molecular markers. The contradiction between evi-dence supporting the earliest cultivation in the lowlands and thegenetically ancestral position of Mexican Highland maize istherefore of particular interest. The disagreement is important,because the adaptive differences between highland and lowlandmaize are profound (14, 29). In other crops, uncertainty about

mexicana parviglumis Meso-American Lowland West Mexico Mex. Highland

05001000150020002500

m

Fig. 2. (Lower) Bar plot of assignment values for the sample of Mexican accessions: Mexicana (red), parviglumis (green), and mays (blue). (Upper) The solidblack line indicates the altitude for each sample. The dotted line marks the minimum altitude at which mexicana occurs.

0.1 0.2 0.3 0.4

F F

0.1 0.2 0.3 0.40.1 0.2 0.3 0.4

F

South-West USCentral USS American Lowland Bolivian LowlandMeso-American Lowland West MexicoCoastal BrazilMexican Highland North USAndean Highland

A B C

Fig. 3. Posterior densities of the genetic drift parameter F for 10 genetic groups with respect to (A)mexicana and (B) parviglumis. Only lowland accessions ofthe West Mexico group (light blue) were included. (C) Drift of all 10 genetic groups with respect to inferred ancestral frequencies. Light blue represents WestMexico; dotted line indicates the division between lowlands (<1,500 m, solid line) and highlands (>2,000 m).

1090 | www.pnas.org/cgi/doi/10.1073/pnas.1013011108 van Heerwaarden et al.

• Identifying gene flow from ssp. mexicana

• Ancestral reconstruction identifies lowland origin

van Heerwaarden et al. 2011 PNAS

Page 13: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Allele frequencies reveal bottleneck, growth

Rare (= =) Common

• 30X landrace genome estimates population size

• Genic regions reflect bottleneck loss of rare alleles

• Nongenic regions of maize show new mutations (⇡ 40%unique) due to exponential growth

Vince Bu↵alo, In Prep

Page 14: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Allele frequencies reveal bottleneck, growth

Rare (= =) Common

• 30X landrace genome estimates population size

• Genic regions reflect bottleneck loss of rare alleles

• Nongenic regions of maize show new mutations (⇡ 40%unique) due to exponential growth

Vince Bu↵alo, In Prep

Page 15: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Genome sequencing identifies changes in diversity

• Full genome sequencing to ⇡ 5x of > 100 temperate andtropical inbreds, landraces, and teosinte

• Maize retained most diversity through both domestication(⇡ 80%) and improvement (> 95%)

Hu↵ord et al. 2012 Nature Genetics; Chia et al. 2012 Nature Genetics

Page 16: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Genome sequencing identifies changes in diversity

• Full genome sequencing to ⇡ 5x of > 100 temperate andtropical inbreds, landraces, and teosinte

• Maize retained most diversity through both domestication(⇡ 80%) and improvement (> 95%)

Hu↵ord et al. 2012 Nature Genetics; Chia et al. 2012 Nature Genetics

Page 17: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Strong selection, including regulatory regions

GRMZM2G136072

• Selection stronger during domestication (s ⇡ 1.5%)

• ⇡ 18% domestication genes show continued selection

• 6� 10% of selected regions contain no genes

• Expression suggests selection on regulatory sequence

Hu↵ord et al. 2012 Nature Genetics; Swanson-Wagner et al. 2012 PNAS

Page 18: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Strong selection, including regulatory regions

GRMZM2G136072

• Selection stronger during domestication (s ⇡ 1.5%)

• ⇡ 18% domestication genes show continued selection

• 6� 10% of selected regions contain no genes

• Expression suggests selection on regulatory sequence

Hu↵ord et al. 2012 Nature Genetics; Swanson-Wagner et al. 2012 PNAS

Page 19: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Domestication candidate genes

• 484 selected regions identified

• Majority of show stronger selection than tb1 or tga1

Hu↵ord et al. 2012 Nature Genetics

Page 20: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Outline

1 DomesticationGeographic origins of maize domesticationImpacts of selection on genomic diversity

2 AdaptationParallel adaptation to new environmentsAdaptive introgression from wild relatives

3 ImprovementHistorical genomics of US maizeDrift and selection in the Iowa RRSThe role of deleterious alleles in maize

4 Conclusions

Page 21: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Repeated adaptation to highlands

Domestication

9,000BP

Lowland S.

America

6,000BP

Highland S.

America

4,000BP

Highland Mexico

6,000BP

Highland SW US

4,000BP

Fonseca et al. in prep.

Page 22: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Parallel phenotypes in S. America and Mexico

Barthakur 1974 Int. J. Biometeor.

Page 23: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Genetic data confirm independent origin

• GBS data from Mexico and S. America landraces

• Independent origins, little admixture between highlands

Takuno et al. in prep

Page 24: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Distinct genetic architecture of highland adaptation

Yi et al. 2010 Science

•F

ST

identifies many candidate SNPs, < 5% shared

• Most (> 80%) found segregating in lowland samples

• Contrast to highland adaptation in humans

Takuno et al. in prep

Page 25: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Distinct genetic architecture of highland adaptation

Yi et al. 2010 Science

•F

ST

identifies many candidate SNPs, < 5% shared

• Most (> 80%) found segregating in lowland samples

• Contrast to highland adaptation in humans

Takuno et al. in prep

Page 26: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Repeated adaptation in maize and teosinte

maize mexicana

Photo: Pesach Lubinsky

mexicana parviglumis

Latuer et al. 2004 Genetics

• Colonization of highland Mexico brought maize intosympatry with highland ssp. mexicana

•mexicana and parviglumis diverged ⇡ 60, 000BP

Ross-Ibarra et al. 2009 Genetics

Page 27: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Repeated adaptation in maize and teosinte

maize mexicana

Photo: Pesach Lubinsky

mexicana parviglumis

Latuer et al. 2004 Genetics

• Colonization of highland Mexico brought maize intosympatry with highland ssp. mexicana

•mexicana and parviglumis diverged ⇡ 60, 000BP

Ross-Ibarra et al. 2009 Genetics

Page 28: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Widespread introgression from mexicana

El Porvenir

Opopeo

Santa Clara

Nabogame

Puruandiro

Xochimilco

Tenango del Aire

San Pedro

Ixtlan

Allopatric

A

0 50 100 150 200 250

Chromosome 4: Maize

Mb

El Porvenir

Opopeo

Santa Clara

Nabogame

Puruandiro

Xochimilco

Tenango del Aire

San Pedro

Ixtlan

HAPMIX

STRUCTURE

El Porvenir

Opopeo

Santa Clara

Nabogame

Puruandiro

Xochimilco

Tenango del Aire

San Pedro

Ixtlan

• SNP genotyping 8 landraces sympatric with mexicana

• 6 genomic regions with mexicana haplotypes introgressedin multiple landraces at high frequencies

• No consistent introgression from maize into mexicana

Hu↵ord et al. 2013 PLoS Genetics

Page 29: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Introgressed regions overlap with teosinte QTL

Hu↵ord et al. 2013 PLoS Genetics

Page 30: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Adaptive introgression from mexicana

• Landraces with introgression show mexicana-likephenotype and superior growth in cold temperatures

• Maize adapted to highland environments in Mexico viagene flow from mexicana

Hu↵ord et al. 2013 PLoS Genetics

Page 31: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Adaptive introgression from mexicana

• Landraces with introgression show mexicana-likephenotype and superior growth in cold temperatures

• Maize adapted to highland environments in Mexico viagene flow from mexicana

Hu↵ord et al. 2013 PLoS Genetics

Page 32: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Outline

1 DomesticationGeographic origins of maize domesticationImpacts of selection on genomic diversity

2 AdaptationParallel adaptation to new environmentsAdaptive introgression from wild relatives

3 ImprovementHistorical genomics of US maizeDrift and selection in the Iowa RRSThe role of deleterious alleles in maize

4 Conclusions

Page 33: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Historical genomics of US maize

• SNP genotyping of 400 historicallandraces and inbreds

• Track allele frequencies

• Estimate genome-wide ancestryusing identity by state

Page 34: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Genetic structure and diversity of US maize

• Increasing structure over time mirrors development ofheterotic groups

• Number and diversity of ancestors decreases over time

van Heerwaarden et al. 2012 PNAS

Page 35: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Selection on quantitative traits

DiscussionThe genomics of breeding history is of great importance to un-derstanding the genetic basis of crop improvement and is in-strumental to the identification of molecular targets of artificialselection. The current state of marker technology has granted usan unprecedented look across eight decades of breeding andselection, providing insight into historical developments in di-versity, ancestry, and the effects of selection across the genome.The transition from open-pollinated varieties to inbred lines

and the emergence of heterotic groups have caused profoundchanges in population structure, linkage disequilibrium, and an-cestry patterns. Differentiation in the first two eras, although sig-nificant, is weak and our results support pedigree analyses (4) thatsuggest current population structure is mainly due to recent di-vergence of breeding pools rather than to different landrace ori-gins. The strong differentiation observed in themodern era 3 linesis likely the result of the use of smaller numbers of more closelyrelated breeding lines and limited genetic exchange among het-erotic groups in the last two eras. Nonetheless, differential land-race ancestry remains detectable in elite material, providing somejustification for the use of the traditional designations Reid(YellowDent) and Lancaster for the SS andNSS heterotic groups.

Compared with the dramatic shifts in ancestry, directionalselection has had limited effect on the genome, with only 5% ofSNPs showing some evidence of consistent selection. Candidatesites, apart from a slight reduction in ancestral diversity, do notdeviate meaningfully from genome-wide patterns of haplotypelength and ancestry. A potential caveat regarding this observa-tion is that our selection scan is most sensitive to cumulativechanges in allele frequency, possibly missing alleles fixed in theearly stages of maize breeding. To account for this potential bias,we measured ancestry distortion and haplotype diversity at the236 SNPs with highest frequency differentiation between eras0 and 3, finding similar results as for our candidate SNPs (i.e., noincrease in distortion and only 12% diversity reduction). Ourresults are also consistent with a recent resequencing studyshowing modest genome-wide effects of recent selection in alimited but geographically diverse sample of maize accessions(23). Nonetheless, a considerable number of candidate regionsare identified across the genome, containing many genes af-fecting processes of agronomic relevance such as lignin synthesis(24) and response to auxin (25) and stress (1). It must also benoted that we have mapped selection associated with breedingprogress per se, and that further analyses may detect selectivechanges specific to individual heterotic groups.

Fig. 3. Evidence for directional selection (Top), basal ancestry distortion (Middle), and ancestral haplotype diversity (Bottom) across the genome. Colorsindicate the separate chromosomes with red vertical lines marking the centromeres. Green dashed horizontal line marks the 99th percentile of Bayes factors;purple dashed horizontal lines indicate median values of ancestry distortion and effective number of basal ancestors. Black vertical ticks mark selectedfeatures. Gray dots mark candidate SNPs. Black circles mark candidates that coincide with sites of low ancestral diversity.

van Heerwaarden et al. PNAS | July 31, 2012 | vol. 109 | no. 31 | 12423

AGRICU

LTUR

AL

SCIENCE

S

• Time GWA reveals SNPs selected across breeding pools• Frequency, diversity suggest selection on common alleles ofsmall e↵ect at quantitative traits

van Heerwaarden et al. 2012 PNAS

Page 36: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Ancestry, not selection, drives diversity

DiscussionThe genomics of breeding history is of great importance to un-derstanding the genetic basis of crop improvement and is in-strumental to the identification of molecular targets of artificialselection. The current state of marker technology has granted usan unprecedented look across eight decades of breeding andselection, providing insight into historical developments in di-versity, ancestry, and the effects of selection across the genome.The transition from open-pollinated varieties to inbred lines

and the emergence of heterotic groups have caused profoundchanges in population structure, linkage disequilibrium, and an-cestry patterns. Differentiation in the first two eras, although sig-nificant, is weak and our results support pedigree analyses (4) thatsuggest current population structure is mainly due to recent di-vergence of breeding pools rather than to different landrace ori-gins. The strong differentiation observed in themodern era 3 linesis likely the result of the use of smaller numbers of more closelyrelated breeding lines and limited genetic exchange among het-erotic groups in the last two eras. Nonetheless, differential land-race ancestry remains detectable in elite material, providing somejustification for the use of the traditional designations Reid(YellowDent) and Lancaster for the SS andNSS heterotic groups.

Compared with the dramatic shifts in ancestry, directionalselection has had limited effect on the genome, with only 5% ofSNPs showing some evidence of consistent selection. Candidatesites, apart from a slight reduction in ancestral diversity, do notdeviate meaningfully from genome-wide patterns of haplotypelength and ancestry. A potential caveat regarding this observa-tion is that our selection scan is most sensitive to cumulativechanges in allele frequency, possibly missing alleles fixed in theearly stages of maize breeding. To account for this potential bias,we measured ancestry distortion and haplotype diversity at the236 SNPs with highest frequency differentiation between eras0 and 3, finding similar results as for our candidate SNPs (i.e., noincrease in distortion and only 12% diversity reduction). Ourresults are also consistent with a recent resequencing studyshowing modest genome-wide effects of recent selection in alimited but geographically diverse sample of maize accessions(23). Nonetheless, a considerable number of candidate regionsare identified across the genome, containing many genes af-fecting processes of agronomic relevance such as lignin synthesis(24) and response to auxin (25) and stress (1). It must also benoted that we have mapped selection associated with breedingprogress per se, and that further analyses may detect selectivechanges specific to individual heterotic groups.

Fig. 3. Evidence for directional selection (Top), basal ancestry distortion (Middle), and ancestral haplotype diversity (Bottom) across the genome. Colorsindicate the separate chromosomes with red vertical lines marking the centromeres. Green dashed horizontal line marks the 99th percentile of Bayes factors;purple dashed horizontal lines indicate median values of ancestry distortion and effective number of basal ancestors. Black vertical ticks mark selectedfeatures. Gray dots mark candidate SNPs. Black circles mark candidates that coincide with sites of low ancestral diversity.

van Heerwaarden et al. PNAS | July 31, 2012 | vol. 109 | no. 31 | 12423

AGRICU

LTUR

AL

SCIENCE

S

• No deviation from genome-wide ancestry at selected sites

• Unusual ancestry instead reflects diversity in ancestral lines

van Heerwaarden et al. 2012 PNAS

Page 37: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Popular lines do not show superior genotypes

The genomic signature of selection is informative of the ge-netic architecture of breeding progress. Two issues of obviousinterest are the selective importance of rare alleles of large effectand the contribution of dominant ancestors with superior mul-tilocus genotypes. The infrequent occurrence of rare ancestralcontributors and absence of extended haplotypes at candidateloci favor a model of selection on common variants rather thanone of strong selective sweeps (26, 27), and we find no evidenceof the long-term success of specific lines being determined bytheir multilocus genotype. This being said, the exceptionally fa-vorable genotypes observed for some era 1 inbreds suggests thatselection of outstanding lines may have occurred, albeit withlimited effect on future genomic composition.In all, our results suggest that genetic gain achieved by plant

breeding has been a complex process, involving a steady accu-mulation of changes at multiple loci (28), combined with heter-osis due to differentiation of breeding pools (29). We therebysupport the notion that selected traits of agronomic importanceare predominantly quantitative in nature (30), with relatively fewdominant contributions from individual alleles or lines. It willtherefore be interesting to see whether our candidates proveuseful in defining improved multilocus targets for genomic se-lection. Although challenging, the application of historical geno-mics to crop improvement is a tantalizing prospect that we hopebreeders will soon put to the test.

MethodsSamples and Genotyping. We obtained a total of 400 accessions from USDepartment of Agriculture (USDA)’s National Plant Germplasm System andcollaborators. Lines were chosen by a combination of literature research,consultation with plant breeders, and by querying the stock database hostedat maizegdb.org for accessions with a large number of references. Ap-proximate ages of the selected lines were similarly obtained from the lit-erature and germplasm databases. Accessions were divided into 99 classicNorth American landraces (era 0), 94 early inbreds from before the 1950s(era 1), 70 advanced public lines from the 1960s and 70s (era 2), and 137 elitecommercial lines from the 1980s and 90s (era 3) that are no longer underplant variety protection (ex-PVP).

For each accession, DNA was extracted by a standard cetyltrimethyl am-monium bromide (CTAB) protocol (31) for genotyping on the IlluminaMaizeSNP50 Genotyping BeadChip platform using the clustering algorithmof the GenomeStudio Genotyping Module v1.0 (Illumina). Of the total of56,110 markers contained on the chip, 45,997 polymorphic SNPs were gen-otyped successfully with less than 10% missing data for use in subsequentanalysis. SNPs were of diverse origins and discovery schemes. We evaluatedthe effects of ascertainment by comparing results for 33,575 SNPs derivedfrom more diverse discovery panels to 12,422 SNPs that were discoveredbetween the advanced public lines B73 and Mo17. Effects on differentiationand selection inference were found to be statistically significant but modest(SI Text).

Diversity, Linkage, and Ancestry Analysis. Diversity analyses followed (32, 33).Briefly, PCA was performed on normalized genotype matrices and thenumber of significant eigenvalues determined by comparison with a Tracy–Widom (TW) distribution (18). Genotypes were assigned to k groups by Wardclustering on the Euclidean distance calculated from the k −1 significant PCs.PCA-based clustering into groups was done separately for each era. To im-prove clustering within era 0, Corn Belt Dents were analyzed separately fromNorthern Flints and a divergent group containing a popcorn and a CherokeeFlower Corn (referred to here as popcorn). Genetic differentiation withineach era was measured as the weighted mean of Nicholson’s population-specific differentiation parameter C (19), a measure of allele frequency di-vergence from an estimated base population frequency, calculated for eachgenetic group using the popdiv function of the R (34) package popgen.

For linkage and ancestry analysis, era 0 genotypes were converted to phasedhaplotypes using the program fastPHASE (35). To correct for backgroundlinkage caused by genetic differentiation, linkage disequilibrium (LD) betweenSNPs was calculated as the squared correlation (r2) between inverse logit-transformed residuals of a multiple logistic regression on each SNP, using thefirst six genetic PCs as covariates to correct for population structure. LD decaywas described by nonlinear regression as in ref. 36. Mean haplotype length wascalculated at 1,000 random positions across the genome and compared withthe expected length obtained by randomizing SNPs within each genetic grouparound the same positions. Linkage disequilibrium between closely spaced SNPswas accounted for by randomizing blocks of SNPs separated by more than 4 kb.

We estimated direct genomic ancestry by shared haplotype analysis. Foreach line, the longest shared haplotype with lines from the same era or older

-5 -4 -3 -2 -1 0

05

10

log10(genome-wide ancestry)

enric

hmen

t for

favo

rabl

e al

lele

s

C103

OH43

WF9

W22

B14

B37

I205

MO1W

H49

W182B

CI187-2

-3.5 -3.0 -2.5 -2.0 -1.5 -1.0

-4.0

-3.5

-3.0

-2.5

-2.0

-1.5

-1.0

log10(genome-wide ancestry)

log1

0(an

cest

ry a

t fav

orab

le a

llele

s)

ancestral overrepresentation of individual era 1 lines at favorable alleles

relation between enrichment for favorable alleles in individual era 1 lines and genome-wide ancestry

Fig. 4. Analysis of disproportionate ancestral contributions of individual era 1 lines to favorable alleles in era 3. Left: Overrepresentation of individual era 1lines in the ancestry of favorable alleles, estimated by plotting the average ancestry proportion at favorable alleles against the genome-wide proportion.Right: Enrichment (as defined by the log probability ratio (LPR) with respect to noncandidate SNP) of favorable alleles in era 1 lines as a function of theiraverage ancestral contribution to era 3. Black dotted lines represent the 1:1 diagonal and 0 horizontal, respectively. Gray dotted lines are regression lines(slope/r2: 1.15/0.85 and −0.1/0.00). Line names on the Right are shown for lines with LPR values higher than 4 or ancestry proportion above 0.03. Labels inboldface mark breeding lines of known historic popularity.

12424 | www.pnas.org/cgi/doi/10.1073/pnas.1209275109 van Heerwaarden et al.

• No over-representation of early inbreds at selected sites

• Early inbreds contributing most to ancestry are notenriched for beneficial alleles

van Heerwaarden et al. 2012 PNAS

Page 38: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Popular lines do not show superior genotypes

The genomic signature of selection is informative of the ge-netic architecture of breeding progress. Two issues of obviousinterest are the selective importance of rare alleles of large effectand the contribution of dominant ancestors with superior mul-tilocus genotypes. The infrequent occurrence of rare ancestralcontributors and absence of extended haplotypes at candidateloci favor a model of selection on common variants rather thanone of strong selective sweeps (26, 27), and we find no evidenceof the long-term success of specific lines being determined bytheir multilocus genotype. This being said, the exceptionally fa-vorable genotypes observed for some era 1 inbreds suggests thatselection of outstanding lines may have occurred, albeit withlimited effect on future genomic composition.In all, our results suggest that genetic gain achieved by plant

breeding has been a complex process, involving a steady accu-mulation of changes at multiple loci (28), combined with heter-osis due to differentiation of breeding pools (29). We therebysupport the notion that selected traits of agronomic importanceare predominantly quantitative in nature (30), with relatively fewdominant contributions from individual alleles or lines. It willtherefore be interesting to see whether our candidates proveuseful in defining improved multilocus targets for genomic se-lection. Although challenging, the application of historical geno-mics to crop improvement is a tantalizing prospect that we hopebreeders will soon put to the test.

MethodsSamples and Genotyping. We obtained a total of 400 accessions from USDepartment of Agriculture (USDA)’s National Plant Germplasm System andcollaborators. Lines were chosen by a combination of literature research,consultation with plant breeders, and by querying the stock database hostedat maizegdb.org for accessions with a large number of references. Ap-proximate ages of the selected lines were similarly obtained from the lit-erature and germplasm databases. Accessions were divided into 99 classicNorth American landraces (era 0), 94 early inbreds from before the 1950s(era 1), 70 advanced public lines from the 1960s and 70s (era 2), and 137 elitecommercial lines from the 1980s and 90s (era 3) that are no longer underplant variety protection (ex-PVP).

For each accession, DNA was extracted by a standard cetyltrimethyl am-monium bromide (CTAB) protocol (31) for genotyping on the IlluminaMaizeSNP50 Genotyping BeadChip platform using the clustering algorithmof the GenomeStudio Genotyping Module v1.0 (Illumina). Of the total of56,110 markers contained on the chip, 45,997 polymorphic SNPs were gen-otyped successfully with less than 10% missing data for use in subsequentanalysis. SNPs were of diverse origins and discovery schemes. We evaluatedthe effects of ascertainment by comparing results for 33,575 SNPs derivedfrom more diverse discovery panels to 12,422 SNPs that were discoveredbetween the advanced public lines B73 and Mo17. Effects on differentiationand selection inference were found to be statistically significant but modest(SI Text).

Diversity, Linkage, and Ancestry Analysis. Diversity analyses followed (32, 33).Briefly, PCA was performed on normalized genotype matrices and thenumber of significant eigenvalues determined by comparison with a Tracy–Widom (TW) distribution (18). Genotypes were assigned to k groups by Wardclustering on the Euclidean distance calculated from the k −1 significant PCs.PCA-based clustering into groups was done separately for each era. To im-prove clustering within era 0, Corn Belt Dents were analyzed separately fromNorthern Flints and a divergent group containing a popcorn and a CherokeeFlower Corn (referred to here as popcorn). Genetic differentiation withineach era was measured as the weighted mean of Nicholson’s population-specific differentiation parameter C (19), a measure of allele frequency di-vergence from an estimated base population frequency, calculated for eachgenetic group using the popdiv function of the R (34) package popgen.

For linkage and ancestry analysis, era 0 genotypes were converted to phasedhaplotypes using the program fastPHASE (35). To correct for backgroundlinkage caused by genetic differentiation, linkage disequilibrium (LD) betweenSNPs was calculated as the squared correlation (r2) between inverse logit-transformed residuals of a multiple logistic regression on each SNP, using thefirst six genetic PCs as covariates to correct for population structure. LD decaywas described by nonlinear regression as in ref. 36. Mean haplotype length wascalculated at 1,000 random positions across the genome and compared withthe expected length obtained by randomizing SNPs within each genetic grouparound the same positions. Linkage disequilibrium between closely spaced SNPswas accounted for by randomizing blocks of SNPs separated by more than 4 kb.

We estimated direct genomic ancestry by shared haplotype analysis. Foreach line, the longest shared haplotype with lines from the same era or older

-5 -4 -3 -2 -1 0

05

10

log10(genome-wide ancestry)

enric

hmen

t for

favo

rabl

e al

lele

s

C103

OH43

WF9

W22

B14

B37

I205

MO1W

H49

W182B

CI187-2

-3.5 -3.0 -2.5 -2.0 -1.5 -1.0

-4.0

-3.5

-3.0

-2.5

-2.0

-1.5

-1.0

log10(genome-wide ancestry)

log1

0(an

cest

ry a

t fav

orab

le a

llele

s)

ancestral overrepresentation of individual era 1 lines at favorable alleles

relation between enrichment for favorable alleles in individual era 1 lines and genome-wide ancestry

Fig. 4. Analysis of disproportionate ancestral contributions of individual era 1 lines to favorable alleles in era 3. Left: Overrepresentation of individual era 1lines in the ancestry of favorable alleles, estimated by plotting the average ancestry proportion at favorable alleles against the genome-wide proportion.Right: Enrichment (as defined by the log probability ratio (LPR) with respect to noncandidate SNP) of favorable alleles in era 1 lines as a function of theiraverage ancestral contribution to era 3. Black dotted lines represent the 1:1 diagonal and 0 horizontal, respectively. Gray dotted lines are regression lines(slope/r2: 1.15/0.85 and −0.1/0.00). Line names on the Right are shown for lines with LPR values higher than 4 or ancestry proportion above 0.03. Labels inboldface mark breeding lines of known historic popularity.

12424 | www.pnas.org/cgi/doi/10.1073/pnas.1209275109 van Heerwaarden et al.

The genomic signature of selection is informative of the ge-netic architecture of breeding progress. Two issues of obviousinterest are the selective importance of rare alleles of large effectand the contribution of dominant ancestors with superior mul-tilocus genotypes. The infrequent occurrence of rare ancestralcontributors and absence of extended haplotypes at candidateloci favor a model of selection on common variants rather thanone of strong selective sweeps (26, 27), and we find no evidenceof the long-term success of specific lines being determined bytheir multilocus genotype. This being said, the exceptionally fa-vorable genotypes observed for some era 1 inbreds suggests thatselection of outstanding lines may have occurred, albeit withlimited effect on future genomic composition.In all, our results suggest that genetic gain achieved by plant

breeding has been a complex process, involving a steady accu-mulation of changes at multiple loci (28), combined with heter-osis due to differentiation of breeding pools (29). We therebysupport the notion that selected traits of agronomic importanceare predominantly quantitative in nature (30), with relatively fewdominant contributions from individual alleles or lines. It willtherefore be interesting to see whether our candidates proveuseful in defining improved multilocus targets for genomic se-lection. Although challenging, the application of historical geno-mics to crop improvement is a tantalizing prospect that we hopebreeders will soon put to the test.

MethodsSamples and Genotyping. We obtained a total of 400 accessions from USDepartment of Agriculture (USDA)’s National Plant Germplasm System andcollaborators. Lines were chosen by a combination of literature research,consultation with plant breeders, and by querying the stock database hostedat maizegdb.org for accessions with a large number of references. Ap-proximate ages of the selected lines were similarly obtained from the lit-erature and germplasm databases. Accessions were divided into 99 classicNorth American landraces (era 0), 94 early inbreds from before the 1950s(era 1), 70 advanced public lines from the 1960s and 70s (era 2), and 137 elitecommercial lines from the 1980s and 90s (era 3) that are no longer underplant variety protection (ex-PVP).

For each accession, DNA was extracted by a standard cetyltrimethyl am-monium bromide (CTAB) protocol (31) for genotyping on the IlluminaMaizeSNP50 Genotyping BeadChip platform using the clustering algorithmof the GenomeStudio Genotyping Module v1.0 (Illumina). Of the total of56,110 markers contained on the chip, 45,997 polymorphic SNPs were gen-otyped successfully with less than 10% missing data for use in subsequentanalysis. SNPs were of diverse origins and discovery schemes. We evaluatedthe effects of ascertainment by comparing results for 33,575 SNPs derivedfrom more diverse discovery panels to 12,422 SNPs that were discoveredbetween the advanced public lines B73 and Mo17. Effects on differentiationand selection inference were found to be statistically significant but modest(SI Text).

Diversity, Linkage, and Ancestry Analysis. Diversity analyses followed (32, 33).Briefly, PCA was performed on normalized genotype matrices and thenumber of significant eigenvalues determined by comparison with a Tracy–Widom (TW) distribution (18). Genotypes were assigned to k groups by Wardclustering on the Euclidean distance calculated from the k −1 significant PCs.PCA-based clustering into groups was done separately for each era. To im-prove clustering within era 0, Corn Belt Dents were analyzed separately fromNorthern Flints and a divergent group containing a popcorn and a CherokeeFlower Corn (referred to here as popcorn). Genetic differentiation withineach era was measured as the weighted mean of Nicholson’s population-specific differentiation parameter C (19), a measure of allele frequency di-vergence from an estimated base population frequency, calculated for eachgenetic group using the popdiv function of the R (34) package popgen.

For linkage and ancestry analysis, era 0 genotypes were converted to phasedhaplotypes using the program fastPHASE (35). To correct for backgroundlinkage caused by genetic differentiation, linkage disequilibrium (LD) betweenSNPs was calculated as the squared correlation (r2) between inverse logit-transformed residuals of a multiple logistic regression on each SNP, using thefirst six genetic PCs as covariates to correct for population structure. LD decaywas described by nonlinear regression as in ref. 36. Mean haplotype length wascalculated at 1,000 random positions across the genome and compared withthe expected length obtained by randomizing SNPs within each genetic grouparound the same positions. Linkage disequilibrium between closely spaced SNPswas accounted for by randomizing blocks of SNPs separated by more than 4 kb.

We estimated direct genomic ancestry by shared haplotype analysis. Foreach line, the longest shared haplotype with lines from the same era or older

-5 -4 -3 -2 -1 0

05

10

log10(genome-wide ancestry)

enric

hmen

t for

favo

rabl

e al

lele

s

C103

OH43

WF9

W22

B14

B37

I205

MO1W

H49

W182B

CI187-2

-3.5 -3.0 -2.5 -2.0 -1.5 -1.0

-4.0

-3.5

-3.0

-2.5

-2.0

-1.5

-1.0

log10(genome-wide ancestry)

log1

0(an

cest

ry a

t fav

orab

le a

llele

s)

ancestral overrepresentation of individual era 1 lines at favorable alleles

relation between enrichment for favorable alleles in individual era 1 lines and genome-wide ancestry

Fig. 4. Analysis of disproportionate ancestral contributions of individual era 1 lines to favorable alleles in era 3. Left: Overrepresentation of individual era 1lines in the ancestry of favorable alleles, estimated by plotting the average ancestry proportion at favorable alleles against the genome-wide proportion.Right: Enrichment (as defined by the log probability ratio (LPR) with respect to noncandidate SNP) of favorable alleles in era 1 lines as a function of theiraverage ancestral contribution to era 3. Black dotted lines represent the 1:1 diagonal and 0 horizontal, respectively. Gray dotted lines are regression lines(slope/r2: 1.15/0.85 and −0.1/0.00). Line names on the Right are shown for lines with LPR values higher than 4 or ancestry proportion above 0.03. Labels inboldface mark breeding lines of known historic popularity.

12424 | www.pnas.org/cgi/doi/10.1073/pnas.1209275109 van Heerwaarden et al.

• No over-representation of early inbreds at selected sites

• Early inbreds contributing most to ancestry are notenriched for beneficial alleles

van Heerwaarden et al. 2012 PNAS

Page 39: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Selection in the Iowa RRS

• BSSS, BSCB1 selected for hybrid yield and agronomics

• SNP genotyping of founders and plants from 5 cycles

• Allele frequency divergence mostly due to genetic drift

Gerke et al. In Review

Page 40: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Selection in the Iowa RRS

• BSSS, BSCB1 selected for hybrid yield and agronomics

• SNP genotyping of founders and plants from 5 cycles

• Allele frequency divergence mostly due to genetic drift

Gerke et al. In Review

Page 41: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

No overlap in selection suggests complementation

Gerke et al. In Review

Page 42: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

No overlap in selection suggests complementation

Gerke et al. In Review

Page 43: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Many new mutations, most deleterious

Lohmueller 2013 arXivJones 1924 Genetics

• ⇡90 mutations per meiosis, > 80% deleterious

• Population growth increases rare deleterious variants andthese explain a larger proportion of V

A

• GWAS has low power to detect rare deleterious variants

Page 44: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Computational prediction of deleterious alleles

• Published GBS, heterosis data of maize 282 population

•A priori identify putatively deleterious alleles fromconservation and physicochemical properties

• Deleterious nonsynonymous at lower frequencies thannondeleterious

Mezmouk & Ross-Ibarra 2014 G3

Page 45: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Constraint; no evidence of positive selection

0.0

0.1

0.2

0.3

0.4

0.5

C NotC

K NK S

del

Deleterious

None

• Few of high-frequency deleterious alleles show significantsignals of selection

• Genes with del. SNPs show lower constraint (higher K

N

K

S

)

Mezmouk & Ross-Ibarra 2014 G3; Hu↵ord et al. 2012 Nature Genetics

Page 46: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Deleterious allele frequencies consistent with BPH

• BPH increases with distance from B73 tester

• Significant BPH even among sti↵-stalk lines

Mezmouk & Ross-Ibarra 2014 G3

Page 47: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Deleterious allele frequencies consistent with BPH

• BPH increases with distance from B73 tester

• Significant BPH even among sti↵-stalk lines

Mezmouk & Ross-Ibarra 2014 G3

Page 48: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Heterosis GWA genes enriched for deleterious alleles

• No enrichment for individual deleterious SNPs (low power)

• Genes associated with heterosis (for all traits) are enrichedin deleterious alleles

Mezmouk & Ross-Ibarra 2014 G3

Page 49: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Introduction

DomesticationOriginsDiversity

AdaptationParallelIntrogression

ImprovementUS germplasmIowa RRSDeleterious

Conclusions

Conclusions

• Population genetic analyses are allowing clarification ofmaize origins and e↵ects of selection on maize genome.

• Maize adaptation to new environments has taken multipledistinct routes, including utilizing genes from wild relatives.

• Genetic drift and selection on common variants appears tohave dominated US maize germplasm.

• Patterns of complementation and frequencies ofdeleterious alleles support a simple dominance model ofheterosis.

Page 50: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Improvement candidate genes

• 695 selected regions identified

Hu↵ord et al. 2012 Nature Genetics

Page 51: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Selection on gene expression

Expression changes

Domestication Improvement

Directional change yes noTissue Specificity no yesDominance in crosses no yes

• Expression at > 18, 000 genes in both maize and teosinte

• Domestication directly acted on candidate gene expression

• Improvement worked with highly expressed genes

• Modern breeding selected for dominance in expression

Hu↵ord et al. 2012 Nature Genetics; Swanson-Wagner et al. 2012 PNAS

Page 52: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Repeated evolution at grassy tillers

• Cloned gt1 as gene underlying QTL for prolificacy

• Selection on di↵erent parts of gene:

A) Temperate zones: selection on 5’ enhancer regionB) Tropical zones: selection on 3’ UTR

Wills et al. 2013 PLoS Genetics

Page 53: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Consistent with QTL for heterosis

• QTL for heterosis enriched in centromeric regions1

1Lariepe et al. 2012 Genetics

Page 54: Population genetics of maize domestication, adaptation, and improvement

MaizeEvolution

J. Ross-Ibarra

Consistent with change in inbred and hybrid yield?

continuing volatility of genotypes over the decades(“genetic diversity in time”).

Analysis by multidimensional scaling of allelepolymorphisms among the parental inbred lines ofthe hybrids separated the older inbred lines (usuallyparents of double cross hybrids) from the newer in-bred lines. The newer lines sorted into two heterot-ic groups, called Stiff Stalk and Non Stiff Stalk. Thisseparation of the newer lines agrees with the obser-vation that breeders, using pedigree informationand empiricism (practical experience), have estab-lished two breeding pools to balance importanttraits in the final hybrids, as well as to improve effi-ciency of seed production (Stiff Stalk for seed par-ents and Non Stiff Stalk for pollinator parents).

Heterosis: hybrid and inbred performanceDUVICK (2005) reviewed several studies (DUVICK,

1984b, 1999; MEGHJI et al., 1984; DUVICK et al., 2004b)that examined the contribution of heterosis to im-provements in yield of U.S. maize hybrids. He con-cluded that(1) Absolute heterosis for grain yield has increased

over the years to a small extent (more so underabiotic stress) but its annual gain is less (some-times much less) than total genetic gain in hy-

brid yield. Inbred and single cross yields haveeach increased over the decades, but singlecross yield has advanced to greater degree (Fig.7). Absolute heterosis is defined as “yield of asingle cross minus the mean yield of its inbredparents” (SCHNELL, 1974).

(2) Relative heterosis for grain yield has decreasedover the years according to two reports, but in-creased slightly in a third study. Relative hetero-sis is defined as “absolute heterosis as percent-age of single cross yield” (SCHNELL, 1974).

(3) Absolute heterosis for plant size and maturityhas decreased to a small degree, in contrast withheterosis for grain yield.

SUMMARY AND CONCLUSIONS

Maize grain yields have risen continually in theU.S. since the 1930s, concomitant with changes incrop management and with the utilization and im-provement of hybrid maize. Approximately 40 to50% of the yield gains are owed to changes in man-agement (e.g., use of herbicides, increased amountsof nitrogen fertilizer) and 50 to -60% to continuinggenetic improvements in maize hybrids releasedduring the past seven decades.

GENETIC PROGRESS IN YIELD OF U.S. MAIZE 199

FIGURE 6 - Groups of 968 alleles from 98 SSR loci distributedacross 10 chromosomes for the historical series of widely grownhybrids. Six groups based on patterns of change in allele fre-quency across nine decades. Number of alleles per group isshown in parentheses. From DUVICK et al. (2004a). Copyright ©2004 by John Wiley & Sons, Inc. This material is used by permis-sion of John Wiley & Sons, Inc.

FIGURE 7 - Yields of single crosses (SX) and their inbred parentmeans (MP), and heterosis as SX – MP. Single-cross pedigrees arebased on heterotic inbred combinations in the Era hybrids duringthe six decades, 1930s through 1980s, 12 inbreds and six singlecrosses per decade. Means of trials grown in three locations in1992 and two locations in 1993 at three densities (30, 54, and 79thousand plants/ha) with one replication per density. From DU-VICK et al. (2004b). Copyright © 2004 by John Wiley & Sons, Inc.This material is used by permission of John Wiley & Sons, Inc.

• Selection in high recombination regions improve inbreds?

• Haplotype blocks in low recombination maintainheterosis?2

2Duvick 205 Advances in Agronomy