bottlenecks and genetic variation - icrisat

36
Bottlenecks and genetic variation

Upload: others

Post on 25-Nov-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Bottlenecks and genetic variation

Common bean example of bottleneck

The genus Glycine

G. max, G. soja

subg. Glycine ~5 MYA

2n = 38, 40, 78, 80

2n = 40

G. max G. soja

~ 26 perennial species; Australia

Bottlenecks in soybean

• Domestication in NE China (6-9,000 ya)

• Local adaptation (before and after domestication)

• Introduction to US (~1765)

• Modern breeding: ~12 founder lines in the US.

Ge

net

ic v

aria

tio

n

Wild population

Modern breeding

domestication

Sources of variation for improvement

• Standing variation in breeding populations

– Shuffling of superior haplotypes

• de novo or induced mutation

• Wild relatives

– Perennials

– Annual progenitor

• Epigenetic variation

Geographic distribution of a few perennial species

G. canescens

G. tomentella D3

G.cyrtoloba

G. stenophita

G. syndetika

G. falcata

sudden death syndrome

http://www.ces.ncsu.edu/depts/pp/notes/Soybean/soy008/Image004S.png

soybean rust

Sclerotinia stem rot

www.ars.usda.gov/is/pr/2004/040729.htm

soybean cyst nematode

http://www.ces.ncsu.edu/depts/pp/notes/Soybean/soy009/Fig3.jpg

bean pod mottle virus

drought/salt tolerance

Resistances and tolerances identified in perennial Glycine species

Genomic resources for the genus Glycine

•Genomic libraries for 8 wild (7 perennial) Glycine species. •Physical map of G. soja •Aligned the wild relative genomes to the G. max genome by alignment of BESs

Factors limiting use of perennial Glycines

• Extreme crossability issues

• Polyploidy

• Recombination

• Non-agronomic performance

G. soja/max

G. falcata

G. stenophita

5-8 my

G. cyrtoloba

G. canescens

G. syndetika

G. tomentella

Annual Glycine—soybean progenitor

1

2

3

4

5

1

2

3

4

5

G. soja G. max

G. soja G. max

• Glycine max is a domesticated form of G. soja

• Domestication reduced variation by ~1/2

• 5-8 my of evolution within G. soja lineage

• Many potential useful traits (some used)

• Know little about the genetic/genomic variation

in G. soja

G. soja/max

G. falcata

G. stenophita

G. cyrtoloba

G. canescens

G. syndetika

G. tomentella

Pan-Genome for wild soybean RUSSEL

SO

UTH

KO

RE

A

JAPA

N

CHINA

Yangtze River

Yellow River

Northeast

Huanghuai

North

South

G

F

E

D C B

A

G

A

B

C

D

E

F

GsojaD

GsojaE

GsojaF

GsojaG

GsojaA

GsojaB

GsojaC

RUSSEL

SO

UTH

KO

RE

A

JAPA

N

CHINA

Yangtze River

Yellow River

Northeast

Huanghuai

North

South

G

F

E

D C B

A

G

A

B

C

D

E

F

GsojaD

GsojaE

GsojaF

GsojaG

GsojaA

GsojaB

GsojaC

GSoja-A GSoja-B GSoja-C GSoja-D GSoja-E GSoja-F GSoja-G

Sequencing depth (x) 117.7 115.6 122.7 136.3 103.1 83.7 104

Mapping rate 90.51% 89.30% 91.80% 89.36% 91.53% 92.02% 92.94%

Estimated genome size (Mb) 981.04 1000.8 1053.78 1118.34 956.43 992.66 889.33

Heterozygosity Rate 0.52% 0.47% 0.50% 0.42% 0.47% 0.28% 0.45%

Repeat Content 67.51% 67.51% 68.08% 71.02% 65.69% 66.69% 67.28%

Assembly size (Mb) 835 906 857 1010 943 873 878

Contig N50 (Kb)* 8.7 22.2 7.5 11.0 26.6 24.2 19.2

Scaffold N50 (Kb) 17.5 56.1 16.3 49.0 62.7 51.9 44.9

• Seven accessions based on population structure and geography

• de novo assemblies` for each

Divergence of G. max from G. soja

G. soja pan genome • G. soja pan-genome has 59,080 gene families and an

overall size 986.3 Mbp. • 28,716 core genes (conserved among all samples);

conversely, half were dispensable.

Adaptation within G. soja lineages

• 1051-1995 genes under positive selection in each G. soja lineage, none of them are selected in all lineages.

• >5,000 genes under positive selection in G. max.

Test for selection on wild soybean lineages (PAML, branch-site model)

5,000

Pan-genome summary

• G. soja pan-genome with 59,080 gene families and an overall size 986.3 Mbp

• 86% of genome is conserved among wild accessions.

• Lots of variation in retention of duplicated genes amongst G. soja and G. max accessions. – Lineage-dependent retention of paralogs

• ~1,000 genes under positive selection in each G. soja, versus ~5,000 in G. max.

• Number of predicted genes in dispensable genome(s) that could be important (adaptation).

Epigenetic variation in soybean

http://blog.oup.com/wp-content/uploads/2012/03/epigenetics.png

24nt siRNA

GENE TE

5cMeth

Transcription

Differential methylation of Genes and TEs

RdDM pathway

OFF On

C

me-CG

me-CHG

me-CHH

H=A, T or C

Simplified scheme of RdDM silencing pathway

DCL3 RDR

2

POL IV

AGO4

C

me-CG

me-CHG

me-CHH

H=A, T or C

TE

24 nt siRNAs

CG CHG CHH siRNA mRNA

GENES

TEs

Differential methylation of Genes and TEs in soybean

RdDM Active gene

GENE

Spreading of DNA methylation

TE

On

C

me-CG

me-CHG

me-CHH

H=A, T or C

FWA (Kinoshita et al. ‎2007), BNS (Saze et.al ‎2007), CmWIP1 (Martin et al. 2009)

OFF

GENE TE

CG

CHG

CHH

siRNA

mRNA

GENES

TEs

Annotation problems in plant genomes

Gene or transposable element?

New TE annotation Pipeline Using 24 nt siRNAs from RdDM pathway

1 2 3 4 5

New TE annotation Pipeline Using 24 nt siRNAs from RdDM pathway

1

2

3

4

5

FAM1

FAM2

1 2 3 4 5

New TE annotation Pipeline Using 24 nt siRNAs from RdDM pathway

siRNA mapping

Extract fasta sequence

Blast all against all

Define families

Define exact borders

CDD tool TE motif

RepeatMasking

TE annotation Pipeline

LINE

Number of elements Length occupied (Mb)

Percentage of masked genome

Soybase TE 75000 520 48

New TE annotation 100000 590 60

RepeatMasking results

CG CHG CHH siRNA mRNA

GENES TEs TEs

1,200 such genes

Detection of non-TE, Methylated Genes (CG,CHG,CHH)

CG CHG CHH siRNA mRNA

GENES

TEs

CG CHG CHH siRNA mRNA

GENES

TEs

CG CHG

CHH siRNA mRNA

GENES

TEs 3,062

‘normal’ gene TE-gene CHG/CHH methylated gene

% Chromosome arm

% Pericentromeric total

Total Genes 75,42 24,58 53,927

Met-genes 29,36 70,64 3,062

Methylated Genes (CG,CHG,CHH) are (1) localized in pericentromeric

regions,

(2) are enriched for pseudogenes,

13

000

3 000

RNA-Seq for 14 conditions (tissues)

600

Gene Type Number

Duplicated 2745 (89%)

Singleton 311 (11%)

(3) and are duplicated

Methylation and expression of paralogs

CG

CHG

CHH

siRNA

mRNA

GENES TEs

CG

CHG

CHH

siRNA

mRNA

GENES TEs

Paralog A: not expressed, CHH/CHG methylation in exons

Paralog B: expressed, no CHH/CHG methylation in exons

Paralogs of CHG/CHH methylated genes are active

and less methylated

GENE GENE

GENE

Duplication

Time

CG

CHG

CHH

100

75

50

20

0

Fp

km

Sc

ore

%

of

DN

A m

eth

yla

tio

n

Met-Gene Paralog

RNA-seq

DNA methylation

Silencing of these genes is relatively recent

100

75

50

20

0

CG

CHG

CHH

G.max

G.soja

Phaseolus

~19 My

soybean Common bean soybean Common

bean

Methylation RNA-seq

Fp

km

Sco

re

Evolutionary Scenario

GENE GENE

GENE

Duplication

GENE GENE OFF On

On

On

On

Time

GENE OFF

GENE OFF

On

Endosperm

Tissues ?

cell types ?

Stress ?

GENE OFF GENE OFF

G--N-- Pseudogene Reversible

OFF GENE GENE On On On

Summary of Epigenetic variation

• A new de novo TE annotation procedure based on small RNA mapping

• Thousands of soybean genes are highly methylated in the three methylation context and are generally not expressed

• Most of these genes are duplicated and their paralogs are unmethylated and expressed

• However, some of these genes are highly expressed in the endosperm. Other tissues???

• How much standing and de novo epigenetic variation exists in soybean and what role does it play in phenotypic variation?

COMMON BEAN: Jeremy Schmutz1,2†*, Phillip McClean3†*, Sujan Mamidi3, G. Albert Wu1,, Steven B. Cannon4, Jane Grimwood2, Jerry Jenkins2, Shengqiang Shu1, Qijian Song5, Carolina Chavarro6, Mirayda Torres-Torres6, Valerie Geffroy7,15, Samira Mafi Moghaddam3, Dongying Gao6, Brian Abernathy6, Kerrie Barry1, Matthew Blair8, Mark A. Brick9, Mansi Chovatia1, Paul Gepts10 , David M Goodstein1, Michael Gonzales6, Uffe Hellsten1, David L. Hyten5,^, Gaofeng Jia5, James D. Kelly11, Dave Kudrna12, Rian Lee3, Manon M.S. Richard7, Phillip N. Miklas13 , Juan M. Osorno3, Josiane Rodrigues5,^^, Vincent Thareau7, Carlos A. Urrea14, Mei Wang1, Yeisoo Yu12, Ming Zhang1, Rod A. Wing12, Perry B. Cregan5, Daniel S. Rokhsar1, Scott A. Jackson6*

SOYBEAN PAN-GENOME: Ying-hui Li1,*, Guangyu Zhou2,*, Jian-xin Ma3,*, Wenkai Jiang2,*, Long-guo Jin1, Zhouhao Zhang2, Yong Guo1, Jinbo Zhang2, Yi Sui1, Liangtao Zheng2, Shan-shan Zhang1, Qiyang Zuo2, Xue-hui Shi1, Yan-fei Li1, Wan-ke Zhang4, Yiyao Hu2, Guanyi Kong2, Hui-long Hong1, Zhang-xiong Liu1, Yaoshen Wang2, Hang Ruan2, Carol KL Yeung2, Jian Liu2, Hailong Wang2, Li-juan Zhang1, Rong-xia Guan1, Ke-jing Wang1, Wen-bin Li5, Shou-yi Chen4, Ru-zheng Chang1, Zhi Jiang2, Scott A. Jackson6, Ruiqiang Li2, 7, Li-juan Qiu1

SOYBEAN METHYLOME: Kyung-Do Kim, Moaine Elbaidouri, Aiko Iwata, Jeremy Schmutz, Robert Schmitz, Scott A. Jackson

"You have a lot of attention on a foolish sport like American football…on a sport that is meant to kill each other... You're so narrow-minded, and then you want to compete against the world when you waste a lot of time, good talent on a sport that sucks.”

Jillert Anema, Dutch skating coach. 21 medals in Sochi.

LINE

GAG-POL LTR-Retrotransposons

RT - RH

SINEs

Transposase

MITE

LTR LTR

The challenge of TE annotation

DNA transposon

Class I

Class II

Class Order/Superfamily/Fam

ily