bottlenecks and genetic variation - icrisat
TRANSCRIPT
The genus Glycine
G. max, G. soja
subg. Glycine ~5 MYA
2n = 38, 40, 78, 80
2n = 40
G. max G. soja
~ 26 perennial species; Australia
Bottlenecks in soybean
• Domestication in NE China (6-9,000 ya)
• Local adaptation (before and after domestication)
• Introduction to US (~1765)
• Modern breeding: ~12 founder lines in the US.
Ge
net
ic v
aria
tio
n
Wild population
Modern breeding
domestication
Sources of variation for improvement
• Standing variation in breeding populations
– Shuffling of superior haplotypes
• de novo or induced mutation
• Wild relatives
– Perennials
– Annual progenitor
• Epigenetic variation
Geographic distribution of a few perennial species
G. canescens
G. tomentella D3
G.cyrtoloba
G. stenophita
G. syndetika
G. falcata
sudden death syndrome
http://www.ces.ncsu.edu/depts/pp/notes/Soybean/soy008/Image004S.png
soybean rust
Sclerotinia stem rot
www.ars.usda.gov/is/pr/2004/040729.htm
soybean cyst nematode
http://www.ces.ncsu.edu/depts/pp/notes/Soybean/soy009/Fig3.jpg
bean pod mottle virus
drought/salt tolerance
Resistances and tolerances identified in perennial Glycine species
Genomic resources for the genus Glycine
•Genomic libraries for 8 wild (7 perennial) Glycine species. •Physical map of G. soja •Aligned the wild relative genomes to the G. max genome by alignment of BESs
Factors limiting use of perennial Glycines
• Extreme crossability issues
• Polyploidy
• Recombination
• Non-agronomic performance
G. soja/max
G. falcata
G. stenophita
5-8 my
G. cyrtoloba
G. canescens
G. syndetika
G. tomentella
Annual Glycine—soybean progenitor
1
2
3
4
5
1
2
3
4
5
G. soja G. max
G. soja G. max
• Glycine max is a domesticated form of G. soja
• Domestication reduced variation by ~1/2
• 5-8 my of evolution within G. soja lineage
• Many potential useful traits (some used)
• Know little about the genetic/genomic variation
in G. soja
G. soja/max
G. falcata
G. stenophita
G. cyrtoloba
G. canescens
G. syndetika
G. tomentella
Pan-Genome for wild soybean RUSSEL
SO
UTH
KO
RE
A
JAPA
N
CHINA
Yangtze River
Yellow River
Northeast
Huanghuai
North
South
G
F
E
D C B
A
G
A
B
C
D
E
F
GsojaD
GsojaE
GsojaF
GsojaG
GsojaA
GsojaB
GsojaC
RUSSEL
SO
UTH
KO
RE
A
JAPA
N
CHINA
Yangtze River
Yellow River
Northeast
Huanghuai
North
South
G
F
E
D C B
A
G
A
B
C
D
E
F
GsojaD
GsojaE
GsojaF
GsojaG
GsojaA
GsojaB
GsojaC
GSoja-A GSoja-B GSoja-C GSoja-D GSoja-E GSoja-F GSoja-G
Sequencing depth (x) 117.7 115.6 122.7 136.3 103.1 83.7 104
Mapping rate 90.51% 89.30% 91.80% 89.36% 91.53% 92.02% 92.94%
Estimated genome size (Mb) 981.04 1000.8 1053.78 1118.34 956.43 992.66 889.33
Heterozygosity Rate 0.52% 0.47% 0.50% 0.42% 0.47% 0.28% 0.45%
Repeat Content 67.51% 67.51% 68.08% 71.02% 65.69% 66.69% 67.28%
Assembly size (Mb) 835 906 857 1010 943 873 878
Contig N50 (Kb)* 8.7 22.2 7.5 11.0 26.6 24.2 19.2
Scaffold N50 (Kb) 17.5 56.1 16.3 49.0 62.7 51.9 44.9
• Seven accessions based on population structure and geography
• de novo assemblies` for each
G. soja pan genome • G. soja pan-genome has 59,080 gene families and an
overall size 986.3 Mbp. • 28,716 core genes (conserved among all samples);
conversely, half were dispensable.
Adaptation within G. soja lineages
• 1051-1995 genes under positive selection in each G. soja lineage, none of them are selected in all lineages.
• >5,000 genes under positive selection in G. max.
Test for selection on wild soybean lineages (PAML, branch-site model)
5,000
Pan-genome summary
• G. soja pan-genome with 59,080 gene families and an overall size 986.3 Mbp
• 86% of genome is conserved among wild accessions.
• Lots of variation in retention of duplicated genes amongst G. soja and G. max accessions. – Lineage-dependent retention of paralogs
• ~1,000 genes under positive selection in each G. soja, versus ~5,000 in G. max.
• Number of predicted genes in dispensable genome(s) that could be important (adaptation).
24nt siRNA
GENE TE
5cMeth
Transcription
Differential methylation of Genes and TEs
RdDM pathway
OFF On
C
me-CG
me-CHG
me-CHH
H=A, T or C
Simplified scheme of RdDM silencing pathway
DCL3 RDR
2
POL IV
AGO4
C
me-CG
me-CHG
me-CHH
H=A, T or C
TE
24 nt siRNAs
CG CHG CHH siRNA mRNA
GENES
TEs
Differential methylation of Genes and TEs in soybean
RdDM Active gene
GENE
Spreading of DNA methylation
TE
On
C
me-CG
me-CHG
me-CHH
H=A, T or C
FWA (Kinoshita et al. 2007), BNS (Saze et.al 2007), CmWIP1 (Martin et al. 2009)
OFF
GENE TE
siRNA mapping
Extract fasta sequence
Blast all against all
Define families
Define exact borders
CDD tool TE motif
RepeatMasking
TE annotation Pipeline
LINE
Number of elements Length occupied (Mb)
Percentage of masked genome
Soybase TE 75000 520 48
New TE annotation 100000 590 60
RepeatMasking results
CG CHG CHH siRNA mRNA
GENES TEs TEs
1,200 such genes
Detection of non-TE, Methylated Genes (CG,CHG,CHH)
CG CHG CHH siRNA mRNA
GENES
TEs
CG CHG CHH siRNA mRNA
GENES
TEs
CG CHG
CHH siRNA mRNA
GENES
TEs 3,062
‘normal’ gene TE-gene CHG/CHH methylated gene
% Chromosome arm
% Pericentromeric total
Total Genes 75,42 24,58 53,927
Met-genes 29,36 70,64 3,062
Methylated Genes (CG,CHG,CHH) are (1) localized in pericentromeric
regions,
(2) are enriched for pseudogenes,
13
000
3 000
RNA-Seq for 14 conditions (tissues)
600
Gene Type Number
Duplicated 2745 (89%)
Singleton 311 (11%)
(3) and are duplicated
Methylation and expression of paralogs
CG
CHG
CHH
siRNA
mRNA
GENES TEs
CG
CHG
CHH
siRNA
mRNA
GENES TEs
Paralog A: not expressed, CHH/CHG methylation in exons
Paralog B: expressed, no CHH/CHG methylation in exons
Paralogs of CHG/CHH methylated genes are active
and less methylated
GENE GENE
GENE
Duplication
Time
CG
CHG
CHH
100
75
50
20
0
Fp
km
Sc
ore
%
of
DN
A m
eth
yla
tio
n
Met-Gene Paralog
RNA-seq
DNA methylation
Silencing of these genes is relatively recent
100
75
50
20
0
CG
CHG
CHH
G.max
G.soja
Phaseolus
~19 My
soybean Common bean soybean Common
bean
Methylation RNA-seq
Fp
km
Sco
re
Evolutionary Scenario
GENE GENE
GENE
Duplication
GENE GENE OFF On
On
On
On
Time
GENE OFF
GENE OFF
On
Endosperm
Tissues ?
cell types ?
Stress ?
GENE OFF GENE OFF
G--N-- Pseudogene Reversible
OFF GENE GENE On On On
Summary of Epigenetic variation
• A new de novo TE annotation procedure based on small RNA mapping
• Thousands of soybean genes are highly methylated in the three methylation context and are generally not expressed
• Most of these genes are duplicated and their paralogs are unmethylated and expressed
• However, some of these genes are highly expressed in the endosperm. Other tissues???
• How much standing and de novo epigenetic variation exists in soybean and what role does it play in phenotypic variation?
COMMON BEAN: Jeremy Schmutz1,2†*, Phillip McClean3†*, Sujan Mamidi3, G. Albert Wu1,, Steven B. Cannon4, Jane Grimwood2, Jerry Jenkins2, Shengqiang Shu1, Qijian Song5, Carolina Chavarro6, Mirayda Torres-Torres6, Valerie Geffroy7,15, Samira Mafi Moghaddam3, Dongying Gao6, Brian Abernathy6, Kerrie Barry1, Matthew Blair8, Mark A. Brick9, Mansi Chovatia1, Paul Gepts10 , David M Goodstein1, Michael Gonzales6, Uffe Hellsten1, David L. Hyten5,^, Gaofeng Jia5, James D. Kelly11, Dave Kudrna12, Rian Lee3, Manon M.S. Richard7, Phillip N. Miklas13 , Juan M. Osorno3, Josiane Rodrigues5,^^, Vincent Thareau7, Carlos A. Urrea14, Mei Wang1, Yeisoo Yu12, Ming Zhang1, Rod A. Wing12, Perry B. Cregan5, Daniel S. Rokhsar1, Scott A. Jackson6*
SOYBEAN PAN-GENOME: Ying-hui Li1,*, Guangyu Zhou2,*, Jian-xin Ma3,*, Wenkai Jiang2,*, Long-guo Jin1, Zhouhao Zhang2, Yong Guo1, Jinbo Zhang2, Yi Sui1, Liangtao Zheng2, Shan-shan Zhang1, Qiyang Zuo2, Xue-hui Shi1, Yan-fei Li1, Wan-ke Zhang4, Yiyao Hu2, Guanyi Kong2, Hui-long Hong1, Zhang-xiong Liu1, Yaoshen Wang2, Hang Ruan2, Carol KL Yeung2, Jian Liu2, Hailong Wang2, Li-juan Zhang1, Rong-xia Guan1, Ke-jing Wang1, Wen-bin Li5, Shou-yi Chen4, Ru-zheng Chang1, Zhi Jiang2, Scott A. Jackson6, Ruiqiang Li2, 7, Li-juan Qiu1
SOYBEAN METHYLOME: Kyung-Do Kim, Moaine Elbaidouri, Aiko Iwata, Jeremy Schmutz, Robert Schmitz, Scott A. Jackson
"You have a lot of attention on a foolish sport like American football…on a sport that is meant to kill each other... You're so narrow-minded, and then you want to compete against the world when you waste a lot of time, good talent on a sport that sucks.”
Jillert Anema, Dutch skating coach. 21 medals in Sochi.