molecular characterization of local adaptation of natural flowering dogwood populations (c. florida)...
TRANSCRIPT
Molecular Characterization ofLocal Adaptation of Natural Flowering
Dogwood Populations(C. florida)
to Fungal Pathogens and Environmental Stress
Andrew PaisAdvisor: Dr. Qiuyun Jenny Xiang
Department of Plant & Microbial BiologyNorth Carolina State University
Insect-disease risk
Insect-disease risk
Why the flowering dogwood tree?Why the flowering dogwood tree?• State flower of North Carolina, State tree of VirginiaState flower of North Carolina, State tree of Virginia• Economically valuable in horticultureEconomically valuable in horticulture
• Sales ~ $70 million per yearSales ~ $70 million per year• Ecologically important calcium pumpEcologically important calcium pump
Fig. 3. Conceptual model of calcium (Ca) cycling in an eastern U.S. hardwood forest with and without dogwood (Cornus florida) present. Arrow thickness indicates amount of Ca movement and box size indicates size of available Ca pool. Loss of dogwood would resul...
Eric J. Holzmueller, Shibu Jose, Michael A. Jenkins
Ecological consequences of an exotic fungal disease in eastern U.S. hardwood forests
Forest Ecology and Management, Volume 259, Issue 8, 2010, 1347–1353
http://dx.doi.org/10.1016/j.foreco.2010.01.014
Calciumcycle
Study System: Cornus florida L.
Threats
Dogwood AnthracnoseDiscula destructiva Powdery Mildew
Erysiphe pulchra Drought Stress +Sun Scorch
1980
Dogwooddisease
1980
?
?
?
?
??
?
??? ?
?
?
?
?
?
??
?
?
?
?
?
?
?
?
??
?
?
?
Dogwooddisease
1980
??
??
??
?
??
??
??
???
??? ?? ?
??
??
?
??
Dogwooddisease
Adapted from Ennos 2015
Co-evolution under fluctuating environments
Co-evolution involving R and AVR loci in natural populations
Co-evolution involving quantitative genetic resistance to tree pathogens
Pathogen development following initial establishment
Tolerance to tissue invasion
Probability of initial establishment
Pathogen pressure
Co-evolutionary dynamics of trees and pathogens in natural populations
Factors affecting impact of pathogens on individual trees
Disease and geneticconsequences
Adapted from Ennos 2015
Co-evolution under fluctuating environments
Co-evolution involving R and AVR loci in natural populations
Co-evolution involving quantitative genetic resistance to tree pathogens
Pathogen development following initial establishment
Tolerance to tissue invasion
Probability of initial establishment
Pathogen pressure
Co-evolutionary dynamics of trees and pathogens in natural populations
Factors affecting impact of pathogens on individual trees
Disease and geneticconsequences
Adapted from Ennos 2015
Co-evolution under fluctuating environments
Co-evolution involving R and AVR loci in natural populations
Co-evolution involving quantitative genetic resistance to tree pathogens
Pathogen development following initial establishment
Tolerance to tissue invasion
Probability of initial establishment
Pathogen pressure
Co-evolutionary dynamics of trees and pathogens in natural populations
Factors affecting impact of pathogens on individual trees
Disease and geneticconsequences
Adapted from Ennos 2015
Co-evolution under fluctuating environments
Co-evolution involving R and AVR loci in natural populations
Co-evolution involving quantitative genetic resistance to tree pathogens
Pathogen development following initial establishment
Tolerance to tissue invasion
Probability of initial establishment
Pathogen pressure
Co-evolutionary dynamics of trees and pathogens in natural populations
Factors affecting impact of pathogens on individual trees
Disease and geneticconsequences
Adapted from Ennos 2015
Co-evolution under fluctuating environments
Co-evolution involving R and AVR loci in natural populations
Co-evolution involving quantitative genetic resistance to tree pathogens
Pathogen development following initial establishment
Tolerance to tissue invasion
Probability of initial establishment
Pathogen pressure
Co-evolutionary dynamics of trees and pathogens in natural populations
Factors affecting impact of pathogens on individual trees
Disease and geneticconsequences
SSR: Hadziabdic et al 2010 and 2012 Chloroplast: Call et al 2015
Prior assessments of genetic diversity
Objectives
• I. NC pilot study of adaptive variation
• II. Range-wide study of adaptive variation
• III. Secondary chemical diversity
Objectives
• I. NC pilot study of adaptive variation
• II. Range-wide study of adaptive variation
• III. Secondary chemical diversity
Objectives
• I. NC pilot study of adaptive variation
• II. Range-wide study of adaptive variation
• III. Secondary chemical diversity
I. Objectives
• Genetic analyses using GBS to identify genetic markers associated with environmental variables and disease severity.
I. Questions• Has the species evolved local adaptation as a consequence of
environmentally heterogeneous ecological pressures?
• Which SNPs are likely to be candidates under selection?
• Which environmental gradients are most important to genetic divergence and local adaptation of C. florida populations if any?
• What genetic predisposition does C. florida possess to adapt to ongoing climate change in North Carolina?
I. Questions (cont.)• And how does repeated GBS experimentation influence
final results?
• Double-digestion of libraries using PstI +MspI (Peterson et al 2012)
Genetic Data - GBS
• Two libraries of Illumina Hi-Seq 2000 – 100bp paired-end• 96+85 samples (library one and two)
Mountain
Piedmont
Coast
SM PI
DKUM
NW
CF
Sampling (six populations; 181 individuals)Studydesign
“Whoever the men were who designed the geographical biscuit cutter which sliced out the Old North State, they succeeded so well botanically that one might think of them as possessed with less political sense than vegetational acumen…”
“In a very real sense North Carolina, though lying at right angles to the north-south longitudinal lines, unites Canada and Florida within a little over two-thirds of her length.”
Bertram Whittier Wells (1932)The Natural Gardens of North Carolina
Studydesign
Rainfall
Wetter DrierLighter Darker
Length of Growing Period (LGP)
Longer BluerShorter Greener
Soil Type
HistosolsUltisols
InceptisolsOccurrence by county
Dogwood Anthracnose
Studydesign
• Environmental variables/traits• Plant Phenotypeo Disease severityo Osmotic leaf potential (drought resistance)
• Multi-locus genotyping with GBS approach and Illumina sequencing
Data Collection Studydesign
• Environmental variables/traits• Plant Phenotypeo Disease severityo Osmotic leaf potential (drought resistance)
• Multi-locus genotyping with GBS approach and Illumina sequencing
Data Collection One Man Army
Studydesign
Environmental Data
Field Measurements• Elevation and coordinates• Diameter by height• Canopy coverage over each
tree• Nutrients of soil surface
coreP, K, Ca, Mg, S, Na, Mn, Cu, Zn
• %HM, CEC, Ac, pH, W/V, %BS
Climate• Mean Temp• Mean Rainfall• Frost free/growing period
Soil classifications• Histosols• Ultisols• Inceptisols
Phenotype Datao Disease severity (plant health) - % leaf blotting
and branch dieback (Mielke and Langdon 1986)
1 2 3 4 5HealthyDiseased
Phenotype Data
o Osmotic leaf potential per tree
• Branch cuttings from field osmotically adjusted in water
• Recorded with Osmometer
• Mmol/kg [solute/water] was representative of leaf osmotic potential
Analytical methods
Gradient forest analysis (GF)
• Utilizes presence-absence of alleles per sample
• Cumulative importance of allele along ecological gradients
• Ellis et al 2012; Fitzpatrick et al 2015
Fst outlier analyses
• Hierarchical Fst-Het. model (Excoffier et al 2010)
• Island model (Foll and Gaggiotti 2008)
120 130 140 150 160 170
00.010.0
20.030.0
40.050.0
60.070.0 B195_77
B1219_ 7
B244_51B977_86
MeanJuly precipitation (prec7)
Cum
ulat
ive Im
porta
nceLatent factor mixed modeling
Frichot et al 2013
Fixed env. effect
Locus effect K latent factor
Allele freq. matrix
Data Analysis
FilteringLow
Quality
Read Assembly
FilteringMissing
Data
Identify Outlier
Loci
Correlation Analysis
General Population Structure
GBS
Data AnalysisLibraryOne
LibraryTwo
Data Analysis
Identify Outlier
Loci
Correlation Analysis
x2
Identify Outlier
Loci
Correlation Analysis
General Population Structure
GradientForest
Analysis
Validation Candidate SNPs+
Neutral SNPs
Results : Population Structure
Library Loci Coverage
One 2983 30.0x
Two 2764 34.6x
Cross-validating outlier results reduces false positives
3
27
0
50
2
1
0
490
00
3
1
0
0
46
8
1
0
25
7*
0
0
1
0
1
0
1
1
0
0
106
0
2
0
3
0
0
0
46
030
2
0
0
0
16
3
0
0
7
2
0
1
7
1
0
0
9
7
0
1
Bay
esca
n Li
brar
y1
Bayescan Library2
LFMMLibrary1
LFMMLibrary2
ArlequinLibrary2ArlequinLibrary1
2+1+7+9+7+1+7+3+2+7
Candidate loci (54)54 candidateloci (three overlaps)
Cross-validating outlier results reduces false positives
Library one dataset
54 candidate SNPs+
1171 neutral SNPs
envPC1 envPC2e.g. associating SNPs with PCA scores derived from environmental traits
SNP turnover along ecological gradients
Importance of ecological gradients
Temperature Covariates
Library 1: 2983 SNPsLatent factor mixed modeling
shows high numbers of SNPs
associated to temperature
I. Conclusions• Detected divergence in genetic structure corresponding to
regionally unique selective pressures
• Identified 54 candidate loci likely to be under selection
• Temp. cov. and soil comp. contributed most to adaptive divergence
• Trees to conserve Alleles to conserve
• Cross-validation of consistent GBS results identified neutral and candidate SNPs while lowering false positives
I. Limitations• Not all variability of adaptive landscape examined
• Adaptive significance of de novo GBS reads speculative (i.e. function of larger contigs?)
−4 −2 0 2
4−2−
02
4
Comp.1
2.pmo
CBroader_studyPilot_study
PCA of bioclim predictorsHijmans et al 2005
I. Limitations cont.
• Disease pressure in NC mountains confounded with abiotic factors
Dogwood anthracnose?
• How to define disease in first place?• Prevalence• Incidence• Occurrence
• No gen. div. decline in diseased sites• (SSR- Hadziabdic et al 2010)
(Chloroplast- Call et al 2015)• Disease estimates based on visual
observation• Allelic patterns of resistance genes not
observed
Objectives
• I. NC pilot study of adaptive variation
• II. Range-wide study of adaptive variation
• III. Secondary chemical diversity
II. Objectives
• Characterize the ecological and genomic relationships to disease, controlling for genetic structure.
II. Questions• Do biogeographic patterns revealed from analysis of
GBS markers support patterns from previous population studies of C. florida and related pathogens?
• Has disease in the past three decades influenced the genetic diversity of natural populations of C. florida?
• Where are changes in allele frequencies most abrupt and what ecological gradients are they corresponding to?
Estimating disease occurrence with GBS data
Population Geneticsof C. florida
Cross validate to draft genome of
C. florida
Draft genome alignment
Related Work r2=0.09887, p=2.02E-05 r2=0.1114, p=3.14E-06
Pow
dery
Mild
ew
Pow
dery
Mild
ew
Canopy HealthFrost Free Period
MLR Model
Mean annual prec.
Latitude
Elevation
Days after bud break
Foliar microbe alpha diversity(derived from ITS results)
Pow
dery
Mild
ew
Operational
TaxonomicUnits
NC Sites
Abundant powdery mildew
Prec. Mean
Another estimate of disease occurrence
0 0.0005 0.001 0.0015 0.002 0.0025
Diseased sites
Healthy sites
Proportion of dogwood anthracnose sequences
*Visu
al C
ateg
oriza
tion
Three estimates of disease occurrence
Disease spotted at site?
Disease sequences > 0.1%?
County occurrence of disease?
0 0.0005 0.001 0.0015 0.002 0.0025
Diseased sites
Healthy sites
Proportion of dogwood anthracnose sequences
*Visu
al C
ateg
oriza
tion
Reduce disease categories into disease incidence gradient
Disease spotted at site?
Disease sequences > 0.1%?
County occurrence of disease?
Disease spotted at site?
Disease sequences > 0.1%?
County occurrence of disease?
Low evidence of disease
High evidence of disease
MCA incidence gradient
C. floridaSNPs
General Population Structure
Geneticdiversity
Data Analysis
MCA incidencegradient
EMMAXThree disease
categories
Temp-prec.
monthcollected
LFMMGWAS
BioclimData
C. floridaSNPs AMOVA
FstDiscriminant
analysisC. florida
SNPs Gradientforest
C. floridaSNPs
General Population Structure
Geneticdiversity
Data Analysis
MCA incidencegradient
EMMAXThree disease
categories
Temp-prec.
monthcollected
LFMMGWAS
BioclimData
C. floridaSNPs AMOVA
FstDiscriminant
analysisC. florida
SNPs Gradientforest
C. floridaSNPs
General Population Structure
Geneticdiversity
Data Analysis
MCA incidencegradient
EMMAXThree disease
categories
Temp-prec.
monthcollected
LFMMGWAS
BioclimData
C. floridaSNPs AMOVA
FstDiscriminant
analysisC. florida
SNPs Gradientforest
C. floridaSNPs
General Population Structure
Geneticdiversity
Data Analysis
MCA incidencegradient
EMMAXThree disease
categories
Temp-prec.
monthcollected
LFMMGWAS
BioclimData
C. floridaSNPs AMOVA
FstDiscriminant
analysisC. florida
SNPs Gradientforest
C. floridaSNPs
General Population Structure
Geneticdiversity
Data Analysis
MCA incidencegradient
EMMAXThree disease
categories
Temp-prec.
monthcollected
LFMMGWAS
BioclimData
C. floridaSNPs AMOVA
FstDiscriminant
analysisC. florida
SNPs Gradientforest
C. floridaSNPs
General Population Structure
Geneticdiversity
Data Analysis
MCA incidencegradient
EMMAXThree disease
categories
Temp-prec.
monthcollected
LFMMGWAS
BioclimData
C. floridaSNPs AMOVA
FstDiscriminant
analysisC. florida
SNPs Gradientforest
C. floridaSNPs
General Population Structure
Geneticdiversity
Data Analysis
MCA incidencegradient
EMMAXThree disease
categories
Temp-prec.
monthcollected
LFMMGWAS
BioclimData
C. floridaSNPs AMOVA
FstDiscriminant
analysisC. florida
SNPs Gradientforest
C. floridaSNPs
General Population Structure
Geneticdiversity
Data Analysis
MCA incidencegradient
EMMAXThree disease
categories
Temp-prec.
monthcollected
LFMMGWAS
BioclimData
C. floridaSNPs AMOVA
FstDiscriminant
analysisC. florida
SNPs Gradientforest
C. floridaSNPs
General Population Structure
Geneticdiversity
Data Analysis
MCA incidencegradient
EMMAXThree disease
categories
Temp-prec.
monthcollected
LFMMGWAS
BioclimData
C. floridaSNPs AMOVA
FstDiscriminant
analysisC. florida
SNPs Gradientforest
fastSTRUCTURE DAPC
K2
K3
K4
K5
Disease occurrence
USFS ecological divisions
Results: Genetic Structure
fastSTRUCTURE DAPC
K2
K3
K4
K5
Disease occurrence
USFS ecological divisions
Results: Genetic Structure
fastSTRUCTURE DAPC
K2
K3
K4
—
K5
Disease occurrence
(FHTET)
USFS ecological divisions
Hot
ContinentalWof
MS
Diseased co
unty
map
Results: Genetic Structure
Sites in non-diseased county Sites in diseased county
Results: Genetic DiversityFew differences in allelic richness
Rarifi
ed a
llelic
rich
ness
+ 9
5% C
I
Results: Genetic Diversity
Genetic diversity same between sites with evidence for absence-presence of disease
0.116
0.118
0.12
0.122
0.124
0.126
0.128
0.13
0.132
0.134
Dogwoodanthracnosecountyoccurrence
Contaminantsequencethresholdfordogwoodanthracnose
Visualobservationof diseaseat site
Generalpopulation
Nucle
otid
e dive
rsityy
Noevidence ofdiseaseoccurrence
Evidence ofdiseaseoccurrence
Generalpopulation
Visual absence-presence of disease at site
D. destructiva : C. florida < or > 0.1% Absence-occurrence of dogwood anthracnose by county
Discriminant function 1Discriminant function 1
Discriminant function 1
Genome positionGenome positionGenome position
Loading plotLoading plotLoading plot
Cont
ributi
onDe
nsity
Z>4
Results: SNPs correlated to disease categories
SNP27429_9
DiseasedHealthy
DAPC of disease occurrence-absence
GBS tag 27429_9
Capsicum: subtilisin-like protease
Genome contig 009582F
activating downstream immune signaling processes
Figueiredo et al 2014
STACKS locus-SNPTests detected by STACKS locus blastn hit
Genome scaffold, base-pair position
Proximate blastn hit (+ 3kbp) to genome alignment
27429_9
[DAPC DDES test], [K2 LFMM]
ref|XM_016719275.1| PREDICTED: Capsicum annuum subtilisin-like protease SBT1.1 009582F, 18512
ref|XM_010663313.1| PREDICTED: Vitis vinifera subtilisin-like protease
Candidate gene: subtilisin-like protease
Disease Associations(y=MCA1 dim)
Cluster Model
Significant SNP associations to disease
probability gradient (median Z>4)
K2 12
K3 7
K4 2
K5 0
Results: Latent factor mixed modelling
SNP27429_9
Results: Controlling for abiotic covariates
GWA to MCA disease gradient (EMMAX model) controlling for:
No covariate
Precipitation and mean temperature (month of collection)
EnvPC1 and envPC2
Grand Total
1 1 1 61 1 1 61 1 51 1 5
1 41 4
Results: Allelic transitions (Gradient Forest)
Cum
ulative
impo
rtance
LongitudeLatitude
MCA1 disease gradient
0.01.0
2.03.0
4.0
30 32 34 36 38 40 42
00.010.0
20.030.0
40.050.0
0.01.0
2.03.0
4.0
−95 −90 −85 −80 −75
00.020.0
40.060.0
00.050.0
01.051.0
0.0 0.5 1.0 1.5
00.050.0
01.051.0
02.0
0.0 0.5 1.0Cu
mulat
ive im
porta
nce
Cum
ulative
impo
rtance
1
2
3 4
56 78
1234
12
345
1
2345
Candidate SNPs for disease resilience-susceptibility
Conclusions1. Diseased vs. non-affected areas: few differences in
genetic diversity
2. Some select SNPs consistently associated to disease occurrence (e.g. subtilisin locus)
3. Contour between hot-continental/disease affected region and rest of flowering dogwoods range reflected by allele frequency changes along spatial gradients
4. Several loci of interest certain alleles fixated in areas of disease occurrence
II. Conundrum• How demographic changes will affect species in future?
• i.e. smaller and more isolated populations
• Increasing homozygosity within individual trees
• Less functional precursors available for adaptive potential (screening hypothesis)
II. Conundrum• Do patterns of phenotypic diversity correlate to disease
resistance-susceptibility?
• Do patterns of genetic variation correlate with patterns of phenotypic variation (i.e. secondary metabolites)
Firn and Jones 2000
II. Conundrum• Do patterns of phenotypic diversity correlate to disease
resistance-susceptibility?
• Do patterns of genetic variation correlate with patterns of phenotypic variation (i.e. secondary metabolites)
Firn and Jones 2000
Why study natural diversity of plant secondary metabolism (PSM)?• Small genetic diversity large
diversity in PSM
• Closely tied to ecological functions
• Specialized deterrents to herbivory and disease
• Few studies characterize secondary metabolite diversity in natural populations
Kampranis et al 2007
Asn Ile
6
2
Objectives
• I. NC pilot study of adaptive variation
• II. Range-wide study of adaptive variation
• III. Secondary chemical diversity
Objectives
• I. NC pilot study of adaptive variation
• II. Range-wide study of adaptive variation
• III. Secondary chemical diversity (NC pilot study)
Questions• Is dogwood disease constrained primarily by abiotic factors?
• Is there evidence from candidate SNPs and biomarkers for local adaptation?
• What is the relationship between genetic diversity and chemodiversity?
• Do healthy plants exhibit greater chemodiversity?
• If so, is higher chemodiversity a product of induced responses to the environment or environment-driven selection?
Untargeted Chemical Profiling
• 50% Methanol Extracts• LCMS• XCMS Pipeline
• Filtering/Remove Isotopic peaks
• Missing Data Treatment• Standardization/
Transformation
Chemical Profiling: PCA by Population(2785 peaks x 171 samples)
Data Analysis
Random forest
General genetic-chemical-
environmentalrelationships
Group comparison
Diversity indices
Ordination
Discriminant analysis(DAPC)
Regression
Biomarkeridentification
DAPC
Support vector
machines
DA of partial least
squares
GWAS:EMMAX
Logistic mixed
modelling
Chemicaldiversity
Regression
Logistic mixed
modelling
Gaussian graphical modelling
Data Analysis
Random forest
General genetic-chemical-
environmentalrelationships
Group comparison
Diversity indices
Ordination
Discriminant analysis(DAPC)
Regression
Biomarkeridentification
DAPC
Support vector
machines
DA of partial least
squares
GWAS:EMMAX
Logistic mixed
modelling
Chemicaldiversity
Regression
Logistic mixed
modelling
Gaussian graphical modelling
Data Analysis
Random forest
General genetic-chemical-
environmentalrelationships
Group comparison
Diversity indices
Ordination
Discriminant analysis(DAPC)
Regression
Biomarkeridentification
DAPC
Support vector
machines
DA of partial least
squares
GWAS:EMMAX
Logistic mixed
modelling
Chemicaldiversity
Regression
Logistic mixed
modelling
Gaussian graphical modelling
Estimating Chemical Diversity
• Calculated from all features including isotopic peaks (2785)
• Recalculated from predicted network of glucosides
• Indices used as response-predictor
Shannon Index
Evenness Index
Berger-Parker Index
Simpson’s Indices
Health scores rose with indices of chemical diversity of all 2785 peaks
Shannon’s H’:Uncertainty that standardized unit of chromatogram space for leaf sample falls under a particular chromatogram peak among other peaks passing pre-processing thresholds
DAPC of genetic data DAPC of 377 chemicals
Population genetic clustering congruent with population metabolomic clustering
A CF
PI1PI2
SM2SM1
UMDK
TNC
{
{
{PiedmontMountain
Coastal Plains
Temp. at collectionand prec. of driestmonth strongly correlatedwith chemical ordinationspace
Identification of SNP-chemical association after controlling for abiotic covariates
SNP association to metabolite M435T576
Metabolite M435T576 is important biomarker
Hypothesized glucoside network
Mass properties reveal similar predicted structures
Induced dominance of M435T576 expression increases odds of being healthy
III. Conclusions• Prec. and temp. covariates strongly correlated
with chemical expression
• After controlling for environment, identified SNP-biomarker network likely involved with synthesis/regulation of glucosides
• While chemical data more variable, both clustering patterns of genetic and chemical data similar
• Discovered general trend that increase in chemical richness and evenness linked to healthier trees
• For hypothesized network, upregulation of M435T576 relative to other compounds in network was associated with higher odds of being healthy
Summary
• I. NC pilot study of adaptive variation
• II. Range-wide study of adaptive variation
• III. Secondary chemical diversity
AcknowledgementsAdvisor: Dr. Jenny XiangCollaborators: Dr. Phil Wadl and Dr. James Leebens-MackLab mates: Xiang Liu, Shihori Obata, Juliet Lindo, Ashley Yow, Will Kohlaway, Andres Qi, Jason LattierCommittee members and collaborators: Dr. Ross Whetten, Dr. William Hoffman, Dr. Sirius Li, and Dr. Jean RistainoOthers: Genome Science Laboratories staff, Forest Health Technology Enterprise Team, Shang Xue, Shuping Ruan, North Carolina Forest Services- Health Branch staff (Brian Heath), Dr. Ning Zhang, Dr. Guohong Cai, and Dr. Alex HarkessFunding sources:
Dogwood Genome Project:NSF #: 1444567
Questions?