gc master thesis final - diva portaluu.diva-portal.org/smash/get/diva2:1346606/fulltext01.pdf · í...

31
1 Uncovering the genetic organisation of Claroideoglomus candidum George B Cheng Degree project in biology, Master of science (2 years), 2019 Examensarbete i biologi 45 hp till masterexamen, 2019 Biology Education Centre and Department of Evolutionary Biology, Uppsala University Supervisors: Anna Rosling and Marisol Sanchez Garcia External opponents: Jente Ottenburghs and Boel Olsson

Upload: others

Post on 16-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

1

Uncovering the genetic organisation of

Claroideoglomus candidum

George B Cheng

Degree project in biology, Master of science (2 years), 2019 Examensarbete i biologi 45 hp till masterexamen, 2019 Biology Education Centre and Department of Evolutionary Biology, Uppsala University Supervisors: Anna Rosling and Marisol Sanchez Garcia External opponents: Jente Ottenburghs and Boel Olsson

Page 2: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

1

Acknowledgements

I would like to thank my supervisors Dr. Anna Rosling and Dr. Marisol Sanchez Garcia from the department of Ecology and Genetics at Uppsala University. If I ran into trouble or had a question about my research or writing their doors were always open for me. They were both very supportive and kept me grounded throughout the project. I would also like to extend my gratitude towards the rest of the Rosling research group at Uppsala University for also being supportive and teaching me about their ongoing research. I would like to thank my two external opponents Dr. Jente Ottenbughs and Boel Olsson from Uppsala University. I’m grateful for the comments they made for this thesis that helped shape the final product. Finally, I would like to thank my parents and my brother for supporting and encouraging me throughout these two years leading up to the end of the thesis.

Page 3: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

2

Abstract

Arbuscular mycorrhizal (AM) fungi are hypothesized to have been key players in facilitating the

transition from aquatic to terrestrial plants and continue to benefit plants through their symbiotic

association after 450 million years. These fungi form mycelia that can contain hundreds of nuclei

within one aseptate cytoplasm, which leads to the ongoing debate on whether these

multinucleated fungi are homokaryotic or heterokaryotic. Not only is there evidence to support

the hypothesis of the nuclei as genetically identical, but also the other hypothesis of divergent

nuclei within a single strain. There has been no evidence of sexual reproduction, however

specialized genomic regions specific to meiosis and a putative mating-type (MAT) locus have

recently been identified and may help answer the ongoing debate between homokaryosis and

heterokaryosis.

In this study I applied de novo genome assembly and annotation of 24 individual nuclei from a

single spore of Claroideoglomus candidum. The full length of the de novo genome assembly was

87.6 Mb with 17,542 genes. Estimated polymorphism between the nuclei was very low. I

identified the MAT locus in C. candidum, using a previously sequenced MAT locus from

another congeneric species. Only one of the MAT locus alleles was found in the examined spore.

The evidence points towards homokaryosis as the genetic organization of Claroideoglomus

candidum.

Page 4: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

3

Contents Acknowledgements ....................................................................................................................................... 1

Abstract ......................................................................................................................................................... 2

Introduction .................................................................................................................................................. 4

AM Fungal Symbiosis ................................................................................................................................ 4

Evolutionary Persistence of AMF .............................................................................................................. 6

Genome Sequencing ................................................................................................................................. 8

Project Aims .............................................................................................................................................. 9

Methods ...................................................................................................................................................... 10

Origin of reads ......................................................................................................................................... 10

De novo genome assembly ..................................................................................................................... 10

Genome Annotations .............................................................................................................................. 11

Variant Calling ......................................................................................................................................... 12

MAT Locus ............................................................................................................................................... 12

Results ......................................................................................................................................................... 14

Reference Genome ................................................................................................................................. 14

Individual Nuclei Assemblies ................................................................................................................... 16

Single Nucleotide Polymorphisms .......................................................................................................... 16

MAT locus................................................................................................................................................ 16

Discussion.................................................................................................................................................... 21

References .................................................................................................................................................. 25

Page 5: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

4

Introduction

AM Fungal Symbiosis

Symbiotic associations can be formed between a vast range of different organisms in different

environments, from the red-billed oxpecker picking ticks off large mammals, to the bacteria that

facilitate the tube worms living on deep hydrothermal vents, or the bacteria and fungi that sustain

plants within their roots (Cordes et al. 2005, Mikula et al. 2018). These symbiotic relationships

can be in the form of mutualistic, parasitic, or commensalistic associations, which can be further

divided into facultative or obligatory alliances. The obligatory symbiosis occurs when one or

both symbionts completely depend on the other to survive, whereas the facultative symbiosis is

an optional relationship between the symbionts capable of surviving independently. One of the

oldest and often overlooked obligate symbiotic relationships is found between terrestrial plants

and arbuscular mycorrhizal (AM) fungi. This 450-million-year-old relationship can be found in

nearly 80% of all land plants (Martin 2016).

AM fungal symbiosis, which was established before mutualistic interactions evolved between

insects and vertebrates, was arguably the essential driving force for successful plant colonization

on land (Kiers et al. 2011, Redecker et al. 2000, Heckman et al. 2001). This symbiotic

relationship can have a deep impact on agricultural production. In order to achieve more

environmental-friendly agriculture processes, a better understanding of how to harmonize all

aspects of the agriculture environment including this plant-fungal relationship. Not only do we

need to understand which crops would be best suited for the plot of land, but we also need to be

aware of the microbiota that thrive beneath the surface. The presence of AM fungi facilitates

nutrient uptake by capturing and directing nitrogen and phosphorus to the plant, in exchange for

Page 6: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

5

carbon sources essential for growth and survival of the fungus. Aside from nutrient acquisition

for the host plant; AM fungi facilitates mineral and water absorption (Souza 2015). This

symbiotic relationship is multifold, in helping to promote resistance and tolerance towards

abiotic stress (e.g., drought) and biotic stresses (e.g., pathogens and herbivores) (Campos‐

Soriano et al. 2012, Kiers et al. 2011, Souza 2015). AM fungi can also improve photosynthesis

processes by protecting the photosystems within the chloroplasts against heavy metal toxicity by

forming compounds that bind to heavy metals and inhibit their movement through to above-

ground structures (Zhang et al. 2018). As climate change advances, plants will be exposed to

changes in temperature and other abiotic stresses. The plants cold tolerance is improved with

AM fungi by inducing higher enzymatic activity and increasing secondary metabolite contents

(e.g., flavonoid, lignin) in plants (Chen et al. 2013). Under high temperatures AM fungi can help

the plant cope, protecting the plant’s photosystems and increasing plant growth (Mathur et al.

2018).

Understanding and utilizing AM fungi in agricultural practices could reduce the use of chemical

fertilizers and pesticides, however one challenge is that AM fungi express species dependent host

preferences which can make it difficult to pair to crop species (Angelard et al. 2014, Kim et al.

2017). In a field study by Hijri (2016) potato yield was evaluated in plots inoculated with AM

fungi, and found an overall increase in the yield compared to that of uninoculated plots.

However, some inoculated plots experienced a decrease in yield compared to the uninoculated

plots, revealing other potential causes of reduction. Hijri (2016) suggested advancing several

hypotheses that could explain this reduction; the poor application of the inoculum with

insufficient agitation of the inoculum, surveying for pathogen attacks, competition between AM

Page 7: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

6

fungi in the inoculum and indigenous AM fungi populations. Understanding the local soil

community dynamics and how it affects AM fungi can be crucial to improving product yield.

AM symbiosis can elicit two different community dynamics; positive feedback that strengthens

the mutualism between plant and fungal species but decreases the community diversity; negative

feedback that weakens the mutualism but contributes to the maintenance of the plant and fungal

diversity (Bever 2002). AM fungi can potentially experience genotypic plasticity due to a change

in host plants or their environment (Angelard et al. 2014). The study done by Angelard et al.

(2014) suggests that the fungi show potential for adaptability due to its ability to alter its

nucleotype frequencies to better suit its environment or host. If the AM fungi fuse with different

plant species simultaneously, the nuclei within the hyphal network may be genotypically

different.

Evolutionary Persistence of AMF

An important mechanism for long-term persistence and adaptation in eukaryotic species has been

sexual reproduction. As for asexual reproducers, accumulation of deleterious mutations and loss

of adaptivity often leads to extinction. Most fungi are known to reproduce both sexually and

asexually. However, for a long time, AM fungi have been thought to only reproduce asexually;

many consider them to be ancient asexuals that defy the basis of evolutionary theory by

persisting for 450 million years (Parniske 2008). While sexual reproduction has not been

explicitly observed in AM fungi, it has been inferred to occur due to the presence of a putative

“mating-type” locus (MAT locus) similar the mating type of Basidiomycetes fungi (Ropars et al.

2016).

Page 8: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

7

The MAT locus is a specialized region of the genome that codes for the establishment of cell-

type identity and orchestrates the sexual cycle. The MAT locus also encodes for global

transcription factors which establish cell type identity by controlling the expression of the

developmental cascades, it commonly involves homeodomain or other classic transcriptional

regulatory elements (Fraser and Heitman 2003). The MAT locus contains genes that can code for

homeodomains which code for transcription factors, as well as control the fusion of cells from

different individuals (Fraser and Heitman 2003). The recent identification of the MAT locus

(Ropars et al. 2016) may help describe the genetic structure between nuclei in AM fungi. The

genetic organization of AM fungi could hold the answer for how they have been able to keep up

with the changes in their host and environments.

The AM fungal mycelium is organized as of one continuous cytoplasm of aseptate hyphae with

multinucleated spores that form and hold hundreds to thousands of nuclei flowing through the

entire structure (Marleau et al. 2011). There are two views on the genetic organization of the

nuclei; the heterokaryotic hypothesis stating that AM fungi will have genetically different nuclei,

and the homokaryotic hypothesis explaining that the nuclei will be genetically highly similar. It

is still unclear whether nuclei show significant genetic difference between each other and are

homokaryotic or heterokaryotic. One method to determine which hypothesis suits these fungal

species involves the identification of AM fungal genes related to mating, specifically the

“mating-type locus” (MAT locus) (Ropars et al. 2016). The MAT locus was located in

Rhizophagus irregularis isolates revealing that R. irregularis produce either homokaryotic or

heterokaryotic mycelia. Within the MAT locus there are two open reading frames that contain

the homeodomain-like region that were designated as HD1-like and HD2. The heterokaryotic

Page 9: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

8

isolates have two alleles of HD1-like and HD2, and homokaryotic isolates would have only one

(Ropars et al. 2016).

Genome Sequencing

Genome sequencing methods have been continually expanding especially since the breakthrough

of the human genome (Liu et al. 2012). According to the National Human Genome Research

Institute (NHGRI 2016), these methods have been constantly improving, lowering the cost

drastically compared to the cost in 2001 and making it more accessible to sequence genomes.

This opened avenues of new research for many fields of biology. Assembling genomes unlocks

more information about the species of interest, such as identifying proteins, uncovering

regulatory pathways, or evaluating the differences between or within species (Sharman 2001).

When constructing genome assemblies, there are two approaches that can be utilized, reference-

based assembly and de novo assembly. The de novo assembly is only utilizing the sequenced

reads to construct a genome by comparing each read and using overlapping reads to form longer

contiguous sequences (contigs). These contigs are then positioned to create scaffolds that are

combined to form the final assembly. The reference-based assembly aligns or maps each read to

a previously generated genome sequence of a closely related individual to construct a new

genome or identify single nucleotide variations.

Determining whether the species is heterokaryotic or homokaryotic will depend on how each are

defined. One strict definition is that homokaryosis is when genetic composition among the

individual nuclei are the exact same with no single nucleotide polymorphisms (SNPs). In

Page 10: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

9

contrast, if there are significant amounts of SNPs present then heterokaryosis is observed.

Another possible definition combination could be made about the density of SNPs observed in

the genome. In the pathogenic fungus, Puccinia striiformis f. sp triticiı, it is known that the

homokaryotic and heterokaryotic isolates experience on average 0.41 SNPs/kb and 5.29

SNPs/kb, respectively (Cantu et al. 2013). So, using the heterokaryotic SNP rate from Cantu

(2013) as the threshold, single spores that have a SNP rate over 5.29 SNPs/kb will be considered

heterokaryotic and those with a SNP density below that threshold will be considered

homokaryotic.

Project Aims

The aim of this master thesis project is to assemble the genome and determine the genomic

organization of Claroideoglomus candidum, if it is heterokaryotic or homokaryotic. To do this

the sequences of several individual nuclei from a single spore will be compared. Knowing

whether they have similar or different nuclei may help us understand how AM fungi can

propagate and reproduce specific nuclei based on the plant species they are or will be colonizing;

and if they have specific traits that can benefit specific species of plants.

Page 11: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

10

Methods

Origin of reads

Claroideoglomus candidum CCK pot B6-9 were isolated from a single spore collected from old

field soil in North Carolina, USA. The strain is part of the James Bever collection. From the

culture a single spore was isolated and crushed to release nuclei which were then collected using

fluorescence-activated cell sorting (FACS). Twenty-four nuclei were extracted from the spore,

amplified through multiple displacement amplification and then sequenced with Illumina HiSeq

X (Montoliu-Nerin et al. 2019).

In order to compare results and patterns in this study with those of previously studied fungal

genomes, the parameters for variant calling were replicated from Chen et al. (2018) which was

then followed with a stricter filter for repeats. This was done to avoid potential discrepancies and

try to standardize the approach and be able to compare with other genomic data. Concerns with

comparability between studies was expressed by Ropars and Corradi (2015), since there are

many different techniques in SNP calling, each could produce different results and conclusions

about SNP detection.

De novo genome assembly

The raw reads from each nucleus were normalized before constructing the assembly using

bbnorm of BBMap v. 38.08 (Bushnell 2014) with an average depth of 100x to reduce potential

errors downstream. De novo assemblies for each nucleus were made using SPAdes assembler v

3.11.1 (Bankevich et al. 2012) with default parameters. The individual assemblies were good

quality representing the majority of reads but encountered issues when attempting to construct

Page 12: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

11

the reference assembly using the Lingon pipeline (Montoliu-Nerin et al. 2019). The individual

nuclei assemblies were reassembled with the raw reads using MaSuRCA (Zimin et al. 2013) and

used in the Lingon (Grabherr 2018) pipeline to create the reference genome assembly. The

quality assessment and the statistics of the individual nuclei assemblies and the reference

assembly were performed using BUSCO v. 3.0.2b (Simão et al. 2015) to evaluate the

completeness and Quast v. 4.5.4 (Gurevich et al. 2013) to obtain statistical metrics of the

assembly. Using the metrics from the individual assemblies, two of the nuclei (4, 7) were

removed from further analysis due to poor quality in assembly (Table 1).

KmerGenie v. 1.7039 (Chikhi & Medvedev 2014) was used to estimate the genome size.

Combinations of different number of nuclei were used to generate assemblies to assess the

quality and determine how many nuclei should be used to produce a full genome assembly.

Genome Annotations

Annotations were done using a snakemake workflow of different programs that was specifically

developed to be used in the larger arbuscular mycorrhizal genomic project ongoing in the lab.

RepeatModeler v. 1.0.8_RM4.0.7 (Smit 2008) was used to predict repeats and create a repeat

library that was used by RepeatMasker v. 4.0.7 (Smit 2015) to mask the genome assembly.

GeneMark v. 4.33-es (Ter-Hovhannisyan 2008) was used to predict the protein coding genes from

the repeat-masked assembly. InterProScan v. 5.30-69.0 (Jones et al. 2014), GenomeTools v. 1.5.9

(Gremme et al. 2013), blast v. 2.6.0+ (Camacho et al. 2009), and MAKER v. 3.01.1-beta (Cantarel

et al. 2008) were used for gene predictions and locations.

Page 13: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

12

Variant Calling

Burrows-Wheeler Aligner (BWA-Mem) (Li & Durbin 2009) with -M parameters were used to

map the reads of each nucleus back to the whole genome assembly. Freebayes (Garrison &

Marth 2012) was used to filter and detect variants in the reads using the following parameters

that were also used by Chen et al. (2018): -K -m 30 -C 2 -q 20 -p 1. The parameters were set for

a minimum quality of mapped reads of 30, a minimum set of reads supporting alternative allele

of two, a minimum base quality of 20 and a ploidy of one. A second filter was applied on top of

the first using the vcflib package, vcffilter (Garrison 2018), with the following parameters:

QUAL > 1 removing bad sites, QUAL / AO > 10 ( Quality / Allele Observation Observation

Count ), SAF > 0 and SAR > 0 removing alleles that are on one strand, RPR > 1 and RPL >1

having at least two reads “balanced” on each side, removing reads placed to the left or right, and

RO > 1. BCFtools (Li 2011) stats with default parameters was used to determine the number of

SNPs found in the whole genome, genome without repeats, and only in coding regions.

OrthoMCL v. 2.0.9 (Li et al. 2003) was used to identify single copy orthologs among the 22

nuclei. Single copy orthologs allows for the comparison of the amino acid or nucleotide

sequences of a region present in all 22 nuclei and convey the level of polymorphism in each

nucleus. Freebayes (Garrison & Marth 2012) was used with the aforementioned parameters to

filter and detect variants among the single copy orthologs.

MAT Locus

A HD2 sequence in the same genus as C. candidum, Claroideoglomus claroideum, Genbank

accession number MH445375, was used as the query sequence in blast v 2.7.1+ against all 24

nuclei to find the presence and location of the MAT locus in C. candidum. The two low quality

Page 14: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

13

nuclei were blasted as well to see if the MAT locus was present in the fragmented sequences.

The HD2 sequence specific to C. candidum was extracted with the contigs containing the MAT

locus and were then aligned together using MAFFT v. 7.407 (Katoh & Standley 2013) with

default settings followed by manual alignment inspection.

Page 15: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

14

Results

Reference Genome

The nuclear genome of C. candidum was sequenced and assembled. The whole genome size was

87.60 Mb with 17,542 genes (Table 2). Of the full assembly, 44.7% is comprised of repeats.

Adding the percent completeness and fragmented, the full assembly had a BUSCO of 86.2%

(Table 2). When constructing the full genome assembly, only 8 of the most best quality nuclei

MaSuRCA assemblies were used (Table 3). When increasing the number of assembled single

nuclei, the size of the genome continued to inflate as seen in Figure 1. However, the quality of

the genome, as estimated by BUSCO completeness, did not improve after increasing the number

of nuclei. Using the 8-nuclei assembly had a higher completeness with a high number of single

copy genes and low number of duplicated genes compared to those in the 24-nuclei assembly

(Figure 2). The assembly size of the eight nuclei was very close to the estimated genome size

(87Mb) based on Kmergenie. The consideration for choosing to use eight nuclei for assembling

was a combination of nuclei that had the highest completeness with the highest number of single

core genes and the lowest number of duplicated genes (Table 4).

Page 16: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

15

Figure 1 Overview of the assembly size for each nuclei combination. The number of bases in the assembly continues to increase with each additional nucleus.

Figure 2 Comparison of assembly stats for different number combinations of nuclei. The values of the single (blue), duplicated (orange), and fragmented (gray) genes were used as criteria to determine the best number of nuclei combination for whole genome assembly. With the increasing number of nuclei, the number of duplicated genes increases in place of the decrease in single genes. The red line shows the highest N50 length between 7 nuclei and 14 nuclei.

50000000

60000000

70000000

80000000

90000000

100000000

110000000

120000000

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Num

ber o

f Bas

es

Number of Nuclei

Total length

0

2000

4000

6000

8000

10000

12000

14000

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 240

50

100

150

200

250

300

350

Cont

ig L

engt

h

Number of Nuclei

Num

ber o

f Gen

es

Assembly Stats

Single Duplicated Fragmented Missing N50

Page 17: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

16

Individual Nuclei Assemblies

Two of the 24 nuclei, 4 and 7, were excluded from succeeding analysis due to poor quality of

their assemblies (Table 1). The two nuclei removed had assembly sizes of 4.43 and 2.18 and

BUSCO completeness values of 7.9% and 4.8% which were below the average (51.41%) and the

lowest completeness (29.7%) of the other nuclei (Table 1). The average size of the other

assembled nuclei was 41.55Mb, ranging from 28.32 Mb to 52.59 Mb. The highest completeness

benchmark for the nuclei at 69.3% and the lowest completeness at 29.7% with the average at

51.41%.

Single Nucleotide Polymorphisms

Eleven single copy orthologs were shared between the 22 nuclei and had between 0-2 SNPs per

ortholog (0.0013 SNPs/kb). The SNP density for the whole genome assembly (0.96 SNPs/kb)

and the assembly without repeats (0.98 SNPs/kb), which were lower than the SNP density when

only considering coding regions (1.22 SNPs/kb) (Table 5). Figure 3 shows an example of the

SNPs on one contig observed between nuclei; the grey shows the same as reference, blue marks

nucleotide other than reference and white indicates not found in that nucleus.

MAT locus

Using the HD2-like region of the MAT locus from C. claroideum, part of the MAT locus was

identified in C. candidum. However only one allele of the MAT allele was found in 20 of the 24

nuclei. The 20 contigs containing the MAT locus were aligned together and presented no

differences amongst them (Fig 4). These two observations would indicate that C. candidum is

homokaryotic for the mating type locus.

Page 18: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

17

Figure 3 Several SNP variations seen on the reads found among the 24 nuclei. The reference assembly sequence located on the bottom. The presence of SNPs is represented with the light blue markers. Gray markers are nucleotides that match to the reference sequence. White markers indicate that nucleotide is missing in that nucleus.

Figure 4 Comparison of a segment of the MAT locus. This segment showed no variations in the HD2 region of the MAT locus among the nuclei. The nuclei (11, 12, 20) with empty rows had matches further down the sequence. This segment captured the most overlap among all the nuclei for a better visualization.

Page 19: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

18

Table 1 Assembly stats of the individual nuclei generated from the SPADES assembler. Red indicates nuclei that were excluded due to low quality assemblies.

Nuclei Assembly Size (Mb)

No. of Contigs

N50 Contig

Largest Contig (Kb)

GC % BUSCO % No. of Genes

Repeats (Mb)

1* 38.63 10431 7367 52986 27.81 C: 49.3 F: 12.1 10798 12.25 2* 52.59 11952 10698 86679 27.8 C: 69.3 F: 9.0 12889 20.59 3* 41.85 10228 9033 106112 27.82 C: 52.1 F: 11.4 10646 15.35 4* 4.43 1087 8762 47580 27.86 C: 5.5 F: 2.4 -- -- 5* 45.96 11412 8891 70965 27.81 C: 56.2 F: 10.0 11855 17.18 6* 37.68 10296 7742 61796 27.8 C: 45.5 F: 10.7 10362 13.36 7* 2.18 617 7469 65913 27.86 C: 3.1 F: 1.7 -- -- 8* 48.65 11616 10114 84385 27.81 C: 61.0 F: 10.0 12113 18.56 9* 40.33 10349 8255 70206 27.79 C: 46.9 F: 13.1 10815 13.09

10* 40.73 10396 8403 73020 27.8 C: 51.1 F: 11.0 10897 13.1 11* 47.64 11310 10034 69919 27.8 C: 58.3 F: 8.6 11619 18.25 12* 28.32 8491 6046 45615 27.82 C: 29.7 F:13.4 8541 8.3 13* 52.56 11559 10831 76762 27.8 C: 68.3 F: 7.6 12507 20.2 14* 51.04 11780 10006 76861 27.82 C: 59.7 F: 12.1 12496 19.8 15* 44.98 10949 9631 60484 27.8 C: 54.8 F: 12.1 11598 15.28 16* 35.46 9424 7588 63419 27.82 C: 42.4 F: 13.1 9800 11.11 17* 42.52 9696 9529 56725 27.82 C: 56.2 F: 9.0 10847 13.7 18* 41.69 10441 8208 58320 27.82 C: 52.8 F: 12.1 11090 13.69 19* 38.63 9558 8366 51615 27.82 C: 44.5 F: 12.4 10292 12.2 20* 35.58 9977 7006 52471 27.81 C: 44.8 F: 9.7 10075 10.84 21* 29.72 9361 5630 45369 27.85 C: 35.8 F: 14.5 9312 8.57 22* 38.03 7556 12024 74356 27.82 C: 45.8 F: 7.2 9166 11.95 23* 45.56 10617 9499 61888 27.82 C: 61.4 F: 9.0 11814 14.67 24* 36.06 8476 9251 70117 27.91 C: 45.2 F: 12.4 9435 11.09

Page 20: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

19

Table 2 Assembly stats for the full genome assembly from the MaSuRCA individual nuclei assemblies

Nuclei Assembly Size (Mb)

No. of Contigs

N50 Contig

Largest Contig (Kb)

GC % BUSCO % No. of Genes

Repeats (Mb)

Whole Genome

87.59 11334 11355 104398 27.65 C: 78.9 F:7.24 17542 39.16

Table 3 Individual nuclei MaSuRCA assemblies. * Nuclei used for reference genome assembly.

Nuclei # contigs Largest contig

Total length

N50 Single Duplicated Total single

and duplicated

# contigs

1* 8106 18987 29489618 3766 20 12 32 8106 2* 11388 30797 46219418 4329 42 11 53 11388 3* 8627 26962 32096789 3875 25 10 35 8627 4* 585 15627 2090392 3570 3 1 4 585 5* 9769 29656 38988991 4289 26 11 37 9769 6* 8304 25787 32011304 4077 26 11 37 8304 7* 129 9349 439228 3557 0 1 1 129 8* 10270 25362 42205745 4480 34 9 43 10270 9* 8391 21426 32372008 4096 23 12 35 8391

10* 8335 24135 31974802 4033 28 9 37 8335 11* 10001 22941 39924909 4297 30 14 44 10001 12* 5604 18777 19121213 3490 15 11 26 5604 13* 11447 30619 47577698 4533 44 12 56 11447 14* 10920 30682 43835385 4272 34 16 50 10920 15* 8781 34668 35233894 4309 30 13 43 8781 16* 7113 20841 25697597 3743 22 9 31 7113 17* 8858 26845 34123824 4050 25 10 35 8858 18* 8862 20865 33217182 3930 30 11 41 8862 19* 7998 28571 29685933 3873 21 18 39 7998 20* 7410 17016 26649828 3727 23 8 31 7410 21* 6201 15985 20860658 3434 16 8 24 6201 22* 7177 27037 28499372 4276 23 9 32 7177 23* 9678 26744 37318671 4057 34 11 45 9678 24* 7234 26211 27976344 4099 25 11 36 7234

Page 21: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

20

Table 4 Different Nuclei combination reference assembly. Single and duplicated number of gene and completeness were considered picking which number of nuclei to use. *Nuclei number combination chosen for reference genome assembly

Nuclei Combination

Single Duplicated Fragmented Missing %

Completeness (C)

% Fragmented

(F) %C & F N50

5* 195 32 21 42 78.275 7.241 85.517 10496 6* 195 32 21 42 78.275 7.241 85.517 10496 7* 191 37 18 44 78.620 6.206 84.827 11314 8* 196 33 21 40 78.965 7.241 86.206 11355 9* 194 36 19 41 79.310 6.551 85.862 11221

10* 190 39 21 40 78.965 7.241 86.206 11373 11* 183 40 18 49 76.896 6.206 83.103 11510 12* 180 42 22 46 76.551 7.586 84.137 11479 13* 175 46 22 47 76.206 7.586 83.793 11334 14* 175 48 23 44 76.896 7.931 84.827 11187 15* 166 53 23 48 75.517 7.931 83.448 11068 16* 167 45 24 54 73.103 8.275 81.379 10609 17* 167 53 19 51 75.862 6.551 82.413 10472 18* 166 55 24 45 76.206 8.275 84.482 10499 19* 177 49 25 39 77.931 8.620 86.551 10266 20* 174 47 23 46 76.206 7.931 84.137 9854 21* 161 49 25 55 72.413 8.620 81.034 9658 22* 158 51 33 48 72.068 11.379 83.448 9517 23* 159 51 29 51 72.413 10.000 82.413 9399 24* 161 50 28 51 72.758 9.655 82.413 9469

Table 5 SNP density for the 8-nuclei assembly

Assemblies Size (Mbp) # of SNPs SNP Density (SNPs/kb) Full 87.60 84901 0.96 Without repeats 48.43 47787 0.98 Coding regions 15.81 19249 1.22

Page 22: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

21

Discussion

The full genome and individual nucleus assemblies give insight into the genetic structure of C.

candidum. Although 86.2% genome completeness may not seem very high, it gives us a good

representation of the genome with the small amount of DNA that was available for sequencing

from individual nuclei. To put in perspective of how the 86.2% completeness of C. candidum

stands with other known AM fungi species. The Rhizophagus irregularis genome sequenced by

Lin (2014); where they used monoxenic cultures to get a completeness of 97%. This assembly

was based on an 8-nuclei combination that had the highest completeness out of the other number

of combinations such as the 24-nuclei (82.4%). The 8-nuclei had an assembly size of 87.6 Mb,

the highest number of single core genes (196) and the lowest number of duplicated core genes

(33), while the 24-nuclei assembly with 107.5Mb had 161 single genes and 50 duplicated genes.

Here we see that using more nuclei for creating an assembly did not contribute to the quality of

the assembly. The additional nuclei were inflating the assembly size and increasing duplications

of supposedly single copy genes instead of adding new single copy genes.

The SNP densities for C. candidum was 1.22 SNPs/kb in its coding region and even lower in the

whole assembly without repeats at 0.98 SNPs/kb. As mentioned previously the repeats were

removed to obtain a SNP density with less variant calling errors. Even looking into the 11

orthologs found, the number of SNPs observed for each ortholog ranged from 0-2 SNPs per

orthologous gene among the nuclei. Considering either of the densities, there is still genetic

variation among the nuclei, which brings us back to how we define being homokaryotic or

heterokaryotic. If we use a strict definition that all the nuclei must be genetically the same with

no SNPs to be homokaryotic, then C. candidum would be heterokaryotic. However, there is a

Page 23: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

22

fungal species with known heterokaryotic and homokaryotic isolates, the pathogenic fungi,

Puccinia striiformis f. sp triticiı. The heterokaryotic isolate has an average of 5.29 SNPs/kb

whereas its homokaryotic counterpart has 0.41 SNPs/kb on average (Cantu et al. 2013). These

variations seen in the homokaryotic isolates are from non-synonymous mutations (Cantu et al.

2013). Using the SNP densities of P. striiformis as a reference, the SNP density of C. candidum

would classify more as homokaryotic than as heterokaryotic. Even though the SNP density for C.

candidum is greater than the homokaryotic P. striiformis, it does not exceed the density of being

heterokaryotic. But there are certain cases that complicate the genetic classification. For

example, Tuber melanosporum is a homokaryotic fungus with 0.06 SNPs/kb, while Laccaria

bicolor is a dikaryotic fungus with 0.78 SNPs/kb (Tisserant et al. 2013). Each species has

different SNP densities that correspond to being heterokaryosis or homokaryosis. These two

examples demonstrate how it is not possible to determine clear thresholds for heterokaryosis.

Locating and confirming the presence of the MAT locus in the remaining nuclei would help me

determine if C. candidum is homokaryotic or heterokaryotic for this locus. However, there is a

slight misalignment in heterokaryosis definitions used across studies. In the Ropars et al. (2016)

study, the heterozygotic isolates harboring two alleles of the MAT loci also have SNP variation

less than one SNP/kb. This does not follow the same categorization as P. striiformis as an

example for heterokaryosis, which would classify the isolate as homokaryon. After using

BLAST, HD2 was found in 20 of the 24 nuclei without variation.

From the evidence we gathered, all indicate C. candidum to be a homokaryon based the presence

of a single MAT locus allele as per the definition in Ropars et al. (2016). The low level of

polymorphism falls under that of the heterokaryotic P. striiformis isolate, further supporting

Page 24: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

23

homokaryosis. But when compared to L. bicolor, C. candidum would be considered

heterokaryotic. Which leads to the problem of how to define these classifications.

To be able to determine which type of organization is observed in this genome would depend on

which definition to follow. The strict definition that a homokaryon contain genetically identical

nuclei, then any variation observed would set the individual as a heterokaryon. The other

definition follows the threshold of variation for known homokaryons and heterokaryons. Bever

(2008) defined the genetic organization on a spectrum of heterokaryosity instead of having two

clear cut definitions. This would help explain the small variation seen in species known to be

homokaryotic. I do think that utilizing Bever’s degree of heterozygosity may be a better

approach at describing AM fungi’s genetic organization. Compared to the other known

heterokaryotic species, the genetic variation among nuclei within a single spore is low in C.

candidum.

It is not certain that C. candidum is a homokaryon, since this study was focused on the variation

within a single spore and there could be variation between spores. It would be interesting to see

if there would be more variation between nuclei from a different spore from the same strain.

There could be the possibility that C. candidum shares the same pattern of having different

isolates that are homokaryons and heterokaryons like in Ropars et al. (2016) and Cantu et al.

(2013). Understanding where these variations occur within the genome, can reveal how

impactful they can be. That can also be used to compare C. candidum with other species and see

if they share the same level of variation. Then, we can organize these species on the spectrum of

heterozygosity and see how they compare with each other. Knowing the genetic structure of AM

fungi could play a crucial part in agriculture. It would be interesting to see if the AM fungus is

heterokaryotic would it contain different nuclei that are specific for multiple different hosts as

Page 25: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

24

seen in Angelard et al. (2014), or if it is homokaryotic then it specializes in one host. This

information could help with planning what crops to plant and pair AM fungi inoculum to

increase the production output as seen with the potato yield in Hijri (2016). Especially with the

increase of the human population and potential food shortages, the need for efficient food

production is necessary.

Page 26: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

25

References

Angelard C, Tanner CJ, Fontanillas P, Niculita-Hirzel H, Masclaux F, Sanders IR. 2014. Rapid

genotypic change and plasticity in arbuscular mycorrhizal fungi is caused by a host shift and

enhanced by segregation. The ISME Journal 8: 284–294.

Bever JD. 2002. Negative Feedback within a Mutualism: Host-Specific Growth of Mycorrhizal

Fungi Reduces Plant Benefit. Proceedings: Biological Sciences 269: 2595–2601.

Bushnell, B. BBMap short read aligner. Joint Genome Institute, department of energy (2014).

doi:10.1016/j.avsg.2010.03.022

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009.

BLAST+: architecture and applications. BMC bioinformatics 10: 421.

Campos-Soriano L, García-Martínez J, Segundo BS. 2012. The arbuscular mycorrhizal

symbiosis promotes the systemic induction of regulatory defence-related genes in rice leaves

and confers resistance to pathogen infection. Molecular Plant Pathology 13: 579–592.

Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, Holt C, Alvarado AS, Yandell M.

2008. MAKER: An easy-to-use annotation pipeline designed for emerging model organism

genomes. Genome Research 18: 188–196.

Cantu D, Segovia V, MacLean D, Bayles R, Chen X, Kamoun S, Dubcovsky J, Saunders DG,

Uauy C. 2013. Genome analyses of the wheat yellow (stripe) rust pathogen Puccinia

striiformis f. sp. tritici reveal polymorphic and haustorial expressed secreted proteins as

candidate effectors. BMC Genomics 14: 270.

Chen S, Jin W, Liu A, Zhang S, Liu D, Wang F, Lin X, He C. 2013. Arbuscular mycorrhizal

fungi (AMF) increase growth and secondary metabolism in cucumber subjected to low

temperature stress. Scientia Horticulturae 160: 222–229.

Page 27: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

26

Chen EC, Mathieu S, Hoffrichter A, Sedzielewska-Toro K, Peart M, Pelin A, Ndikumana S,

Ropars J, Dreissig S, Fuchs J, Brachmann A, Corradi N. 2018. Single nucleus sequencing

reveals evidence of inter-nucleus recombination in arbuscular mycorrhizal fungi. eLife 7:

e39813.

Chikhi R, Medvedev P. 2014. Informed and automated k-mer size selection for genome

assembly. Bioinformatics 30: 31–37.

Cordes EE, Arthur MA, Shea K, Arvidson RS, Fisher CR. 2005. Modeling the Mutualistic

Interactions between Tubeworms and Microbial Consortia. PLOS Biology 3: e77.

Fraser JA, Heitman J. 2003. Fungal mating-type loci. Current Biology 13: R792–R795.

Garrison E, Marth G. 2012. Haplotype-based variant detection from short-read sequencing.

arXiv:1207.3907 [q-bio]

Garrison E. Vcflib, a simple C++ library for parsing and manipulating VCF files. 2018.

https://github.com/vcflib/vcflib.

Grabherr, M. G. Lingon: A d-mer based genome assembly pipeline. (2018).

Gremme G, Steinbiss S, Kurtz S. 2013. GenomeTools: A Comprehensive Software Library for

Efficient Processing of Structured Genome Annotations. IEEE/ACM Trans Comput Biol

Bioinformatics 10: 645–656.

Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: quality assessment tool for genome

assemblies. Bioinformatics 29: 1072–1075.

Heckman DS, Geiser DM, Eidell BR, Stauffer RL, Kardos NL, Hedges SB. 2001. Molecular

Evidence for the Early Colonization of Land by Fungi and Plants. Science 293: 1129–1133.

Hijri M. 2016. Analysis of a large dataset of mycorrhiza inoculation field trials on potato shows

highly significant increases in yield. Mycorrhiza 26: 209–214.

Page 28: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

27

Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell

A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong S-Y, Lopez

R, Hunter S. 2014. InterProScan 5: genome-scale protein function classification.

Bioinformatics 30: 1236–1240.

Katoh K, Standley DM. 2013. MAFFT Multiple Sequence Alignment Software Version 7:

Improvements in Performance and Usability. Molecular Biology and Evolution 30: 772–

780.

Kiers ET, Duhamel M, Beesetty Y, Mensah JA, Franken O, Verbruggen E, Fellbaum CR,

Kowalchuk GA, Hart MM, Bago A, Palmer TM, West SA, Vandenkoornhuyse P, Jansa J,

Bücking H. 2011. Reciprocal Rewards Stabilize Cooperation in the Mycorrhizal Symbiosis.

Science 333: 880–882.

Kim SJ, Eo J-K, Lee E-H, Park H, Eom A-H. 2017. Effects of Arbuscular Mycorrhizal Fungi and

Soil Conditions on Crop Plant Growth. Mycobiology 45: 20–24.

Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform.

Bioinformatics (Oxford, England) 25: 1754–1760.

Li H. 2011. A statistical framework for SNP calling, mutation discovery, association mapping

and population genetical parameter estimation from sequencing data. Bioinformatics 27:

2987–2993.

Li L, Stoeckert CJ, Roos DS. 2003. OrthoMCL: Identification of Ortholog Groups for

Eukaryotic Genomes. Genome Research 13: 2178–2189.

Lin K, Limpens E, Zhang Z, Ivanov S, Saunders DGO, Mu D, Pang E, Cao H, Cha H, Lin T,

Zhou Q, Shang Y, Li Y, Sharma T, van Velzen R, de Ruijter N, Aanen DK, Win J, Kamoun

Page 29: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

28

S, Bisseling T, Geurts R, Huang S. 2014. Single Nucleus Genome Sequencing Reveals High

Similarity among Nuclei of an Endomycorrhizal Fungus. PLoS Genetics 10: e1004078.

Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, Lu L, Law M. 2012. Comparison of Next-

Generation Sequencing Systems. Journal of Biomedicine and Biotechnology 2012:

Marleau J, Dalpé Y, St-Arnaud M, ri M. 2011. Spore development and nuclear inheritance in

arbuscular mycorrhizal fungi. BMC Evolutionary Biology 11: 51.

Martin F. 2016. Molecular Mycorrhizal Symbiosis, 1st ed. John Wiley & Sons, Incorporated

Mathur S, Sharma MP, Jajoo A. 2018. Improved photosynthetic efficacy of maize (Zea mays)

plants with arbuscular mycorrhizal fungi (AMF) under high temperature stress. Journal of

Photochemistry and Photobiology B: Biology 180: 149–154.

Mikula P, Hadrava J, Albrecht T, Tryjanowski P. 2018. Large-scale assessment of

commensalistic–mutualistic associations between African birds and herbivorous mammals

using internet photos. PeerJ 6: e4520.

Montoliu-Nerin M, Sánchez-García M, Bergin C, Grabherr M, Ellis B, Kutschera VE, Kierczak

M, Johannesson H, Rosling A. 2019. From single nuclei to whole genome assemblies.

bioRxiv 625814.

NHGRI. 2016. The Cost of Sequencing a Human Genome | NHGRI. online July 6, 2016:

https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost.

Accessed May 14, 2019.

Olson ND, Lund SP, Colman RE, Foster JT, Sahl JW, Schupp JM, Keim P, Morrow JB, Salit

ML, Zook JM. 2015. Best practices for evaluating single nucleotide variant calling methods

for microbial genomics. Frontiers in Genetics, doi 10.3389/fgene.2015.00235.

Page 30: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

29

Parniske M. 2008. Arbuscular mycorrhiza: the mother of plant root endosymbioses. Nature

Reviews Microbiology 6: 763–775.

Redecker D, Kodner R, Graham LE. 2000. Glomalean Fungi from the Ordovician. Science 289:

1920–1921.

Ropars J, Corradi N. 2015. Homokaryotic vs heterokaryotic mycelium in arbuscular mycorrhizal

fungi: different techniques, different results? New Phytologist 208: 638–641.

Ropars J, Toro KS, Noel J, Pelin A, Charron P, Farinelli L, Marton T, Krüger M, Fuchs J,

Brachmann A, Corradi N. 2016. Evidence for the sexual origin of heterokaryosis in

arbuscular mycorrhizal fungi. Nature Microbiology 1: 16033.

Sharman A. 2001. The many uses of a genome sequence. Genome Biology 2: reports4013.1.

Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO:

assessing genome assembly and annotation completeness with single-copy orthologs.

Bioinformatics 31: 3210–3212.

Smit, AFA, Hubley, R. 2008-2015. RepeatModeler Open-1.0. <http://www.repeatmasker.org>.

Smit, AFA, Hubley, R & Green, P. 2013-2015. RepeatMasker Open-4.0.

<http://www.repeatmasker.org>.

Souza T. 2015. Overview. In: Souza T (ed.). Handbook of Arbuscular Mycorrhizal Fungi, pp. 1–

8. Springer International Publishing, Cham.

Ter-Hovhannisyan, V., Lomsadze, A., Chernoff, Y. O. & Borodovsky, M. 2008. Gene prediction

in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome

Res. doi:10.1101/gr.081612.108

Tisserant E, Malbreil M, Kuo A, Kohler A, Symeonidi A, Balestrini R, Charron P, Duensing N,

Frei dit Frey N, Gianinazzi-Pearson V, Gilbert LB, Handa Y, Herr JR, Hijri M, Koul R,

Page 31: GC Master Thesis Final - DiVA portaluu.diva-portal.org/smash/get/diva2:1346606/FULLTEXT01.pdf · í 8qfryhulqj wkh jhqhwlf rujdqlvdwlrq ri &odurlghrjorpxv fdqglgxp *hrujh % &khqj

30

Kawaguchi M, Krajinski F, Lammers PJ, Masclaux FG, Murat C, Morin E, Ndikumana S,

Pagni M, Petitpierre D, Requena N, Rosikiewicz P, Riley R, Saito K, San Clemente H,

Shapiro H, van Tuinen D, Bécard G, Bonfante P, Paszkowski U, Shachar-Hill YY, Tuskan

GA, Young JPW, Sanders IR, Henrissat B, Rensing SA, Grigoriev IV, Corradi N, Roux C,

Martin F. 2013. Genome of an arbuscular mycorrhizal fungus provides insight into the

oldest plant symbiosis. Proceedings of the National Academy of Sciences of the United

States of America 110: 20117–20122.

Treangen TJ, Salzberg SL. 2011. Repetitive DNA and next-generation sequencing:

computational challenges and solutions. Nature Reviews Genetics 13: 36–46.

Zhang H, Xu N, Li X, Long J, Sui X, Wu Y, Li J, Wang J, Zhong H, Sun GY. 2018. Arbuscular

Mycorrhizal Fungi (Glomus mosseae) Improves Growth, Photosynthesis and Protects

Photosystem II in Leaves of Lolium perenne L. in Cadmium Contaminated Soil. Frontiers in

Plant Science, doi 10.3389/fpls.2018.01156.