a generalized framework of amova with any number of ...the analysis of molecular variance (amova) is...
TRANSCRIPT
![Page 1: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/1.jpg)
1
Title 1
A generalized framework of AMOVA with any number of hierarchies and any level of 2
ploidies 3
Authors 4
Kang Huang1,2, Yuli Li1, Derek W. Dunn1, Pei Zhang1, Baoguo Li1,3 5
Addresses 6
1 Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, 7
Northwest University, Xiβan 710069, China 8
2 Department of Forest and Conservation Sciences, University of British Columbia, 9
Vancouver, BC V6T1Z4 Canada. 10
3 Center for Excellence in Animal Evolution and Genetics, Chinese Academy of 11
Sciences, Kunming 650223 China 12
Keywords 13
Analysis of molecular variance, polyploidy, hierarchy, method-of-moment estimation, 14
maximum-likelihood estimation 15
Corresponding author 16
Baoguo Li 17
Telephone: +8613572209390; Fax: +86 029 88303304; E-mail: [email protected] 18
Running title 19
A generalized framework of AMOVA 20
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 2: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/2.jpg)
2
Abstract 21
The analysis of molecular variance (AMOVA) is a widely used statistical model in 22
the studies of population genetics and molecular ecology. The classical framework of 23
AMOVA only supports haploid and diploid data, in which the number of hierarchies 24
ranges from two to four. In practice, natural populations can be classified into more 25
hierarchies, and polyploidy is frequently observed in contemporary species. The ploidy 26
level may even vary within the same species, even within the same individual. We 27
generalized the framework of AMOVA such that it can be used for any number of 28
hierarchies and any level of ploidy. Based on this framework, we present four methods 29
to account for the multilocus genotypic and allelic phenotypic data. We use simulated 30
datasets and an empirical dataset to evaluate the performance of our framework. We 31
make freely available our methods in a software, POLYGENE, which is freely available at 32
https://github.com/huangkang1987/. 33
Keywords: analysis of molecular variance, polyploidy, hierarchy, method-of-moment 34
estimation, maximum-likelihood estimation 35
Introduction 36
The analysis of molecular variance (AMOVA) is a statistical model for the molecular 37
variation in a single species. AMOVA was developed by EXCOFFIER et al. (1992) based on 38
the previous work of decomposing the total variance of gene frequencies into the 39
variance components in different subdivision levels (COCKERHAM 1969; COCKERHAM 40
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 3: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/3.jpg)
3
1973). 41
This statistical model was initially implemented for DNA haplotypes, but it can be 42
applied to any marker datum, e.g., the codominant marker data and the dominant 43
marker data (PEAKALL et al. 1995; PEAKALL AND SMOUSE 2006). The classical framework of 44
AMOVA supports haploids and diploids, and the number of hierarchies ranges from two 45
to four (individual, population, group and total population) (EXCOFFIER AND LISCHER 46
2010). In practice, natural populations can be classified into more than four hierarchies, 47
and the ploidy level may vary within the same species or within the same individual. 48
In many species, physical or ecological barriers prevent random mating (MARTIN 49
AND WILLIS 2007). The resulting partial or total isolation of populations results in genetic 50
differentiation due to the interacting processes of genetic drift, differential gene-flow and 51
natural selection (LANDE 1976). Because the factors restricting gene flow, such as 52
geographical distance (WRIGHT 1943), landscape features (e.g., mountain, river, desert) 53
(GEFFEN et al. 2004; CHAMBERS AND GARANT 2010; LAIT AND BURG 2013), ecological 54
factors (e.g., salt concentration, climatic gradients) (LUPPI et al. 2003; YANG et al. 2014) and 55
behavioral differences (e.g., parental care) (RUSSELL et al. 2004), are not all the same 56
among populations, the gene flow between populations is unevenly distributed. For 57
example, in humans, the intra-city gene-flow is higher than the intra-province, 58
intra-nation and inter-nation gene flows. The population structure, in some situations, 59
can be classified into multilevel hierarchies. 60
Polyploids represent a significant portion of plant species, with anywhere between 61
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 4: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/4.jpg)
4
30% and 80% of angiosperms showing polyploidy (BUROW et al. 2001) and most lineages 62
showing the evidence of paleopolyploidy (OTTO 2007). Due to their significant roles in 63
molecular ecology, evolutionary biology and agriculture studies, polyploids have 64
increasingly become the focus of theoretical and experimental research (AVNI et al. 2017; 65
LING et al. 2018). There are two major problems in the population-genetics analysis of 66
polyploids: genotyping ambiguity and double reduction. 67
For the polymerase chain reaction (PCR)-based markers, because the dosage of alleles 68
cannot be determined by electrophoresis bands, the true genotype cannot be identified 69
from the electrophoresis. This phenomenon is called genotyping ambiguity. For example, if 70
an autotetraploid genotype π΄π΄π΄π΅ has the same electrophoresis band type as the 71
genotype π΄π΄π΅π΅, then these two genotypes cannot be distinguished by electrophoresis. 72
In polyploids, double reduction occurs when a pair of sister chromatids is 73
segregated into the same chromosome, which will cause the corresponding genotypic 74
frequency deviating from Hardy-Weinberg equilibrium (HWE), where we assume that each 75
allele will randomly appear within various genotypes. For autotetraploids, the rate Ξ± of 76
double-reduction is assumed to be 0 under HWE, 1/7 under random chromosome 77
segregation (RCS) (HALDANE 1930), and 1/6 under complete equational segregation (CES) 78
(MATHER 1935). In the partial equational segregation (PES) model, the distance between the 79
target locus and the centromere is incorporated into CES (HUANG et al. 2019). 80
Some software for the polysomic inheritance model has been developed, e.g., 81
POLYSAT (CLARK AND JASIENIUK 2011), SPAGEDI (HARDY AND VEKEMANS 2002), 82
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 5: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/5.jpg)
5
POLYRELATEDNESS (HUANG et al. 2014), GENODIVE (MEIRMANS AND TIENDEREN 2004; 83
MEIRMANS AND LIU 2018), and STRUCTURE (PRITCHARD et al. 2000). However, some of 84
them cannot solve the genotyping ambiguity, and all of them are supposed that the 85
genotypic frequencies accord with HWE. 86
In this paper, we generalize the framework of AMOVA such that any number of 87
hierarchies and any level of ploidy are allowed. Four methods are developed to account 88
for multilocus genotypic and phenotypic data, including three method-of-moment 89
methods and one maximum-likelihood method. Our model has been implemented in a 90
software named POLYGENE, and it is freely available at 91
https://github.com/huangkang1987/. POLYGENE is designed for genotypic or phenotypic 92
datasets, which only supports homoploids to include more population-genetics analyses 93
(e.g., phenotypic/genotypic distribution test). 94
Theory and modeling 95
There are three purposes of typical AMOVA: (i) estimate the variance components in 96
different subdivision levels; (ii) measure the population differentiation with F-statistics 97
(πΉπΌπ, πΉπΌπ , πΉππ, etc.); (iii) test the significance of differentiation. In the following sections, we 98
will briefly describe the general procedures of classic framework of AMOVA, then 99
extend them to the generalized situation. 100
Classic framework 101
The procedures of AMOVA are as follows: (i) calculate the genetic distance between 102
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 6: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/6.jpg)
6
two alleles or two haplotypes; (ii) calculate the sum of squares (SS), the degree of freedom 103
and the mean square (MS) in each source of variation; (iii) solve variance components; (iv) 104
calculate F-statistics; (v) perform permutation tests. 105
The collection consisting of some populations is called a group, denoted by π. We 106
stipulate that each population can only belong to one group, and the union of all groups 107
is the total population. Because an allele or a haplotype (for simplicity, we use βalleleβ to 108
refer them hereafter) is usually neither a discrete nor a continuous random variable 109
(except the allele size in microsatellites), the SS cannot be calculated by the equation 110
SS = β (ππ β οΏ½Μ οΏ½)2
π . Using the genetic distance between any two alleles as a proxy, an 111
alternative method can be used to calculate the SS, whose formula for a group of π allele 112
copies is as follows: 113
SS = βπππ2
π1β€π<πβ€π
, (1)
where πππ is the genetic distance between the πth and the πth alleles. Such genetic 114
distance is one of the following distances: nucleotide difference distance for DNA 115
sequences (DNA sequence, EXCOFFIER et al. 1992), Euclidean distance for dominant 116
markers (dominant marke, PEAKALL et al. 1995), infinity allele model (IAM) distance for 117
codominant marker (codominant marker, COCKERHAM 1973) and stepwise mutation model 118
distance (SMM) for microsatellites (microsatellite, SLATKIN 1995). 119
In variance decomposition, the genetic variance is decomposed as two to four 120
hierarchies, including πWI2 (within individuals), ΟAI/WP
2 (among individuals within 121
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 7: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/7.jpg)
7
populations), ΟAP/WG2 (among populations within groups) and πAG
2 (among groups), 122
where πWI2 and/or πAG
2 are sometimes ignored. Using all of the four hierarchies as an 123
example, the layout of AMOVA is shown in Table 1. 124
By equating the expected MS with the observed MS, the unbiased estimates of 125
variance components can be solved. After that, the F-statistics can be calculated by the 126
following formulas: 127
πΉππΆ =ΟAP/WG2
ΟAP/WG2 + ΟAI/WP
2 + ΟWI2 ,
πΉπΆπ =ΟAG2
ΟAG2 + ΟAP/WG
2 + ΟAI/WP2 + ΟWI
2 ,
πΉπΌπ =ΟAI/WP2
ΟAI/WP2 + ΟWI
2 ,
πΉπΌπ =ΟAG2 + ΟAP/WG
2 + ΟAI/WP2
ΟAG2 + ΟAP/WG
2 + ΟAI/WP2 + ΟWI
2 ,
πΉππ =ΟAP/WG2 + ΟAG
2
ΟAG2 + ΟAP/WG
2 + ΟAI/WP2 + ΟWI
2 .
A permutation test is used to test the significance of differences. The null hypothesis 128
is that there is no differentiation among individuals, populations or groups, and the 129
observed differences are due to the random sampling. This statement is equivalent to the 130
variance ΟWI2 occupying 100% of the total variance, and thus the variances ΟAI/WP
2 , 131
ΟAP/WG2 and ΟAG
2 together with various F-statistics are all zero. 132
In each permutation, the allele copies are randomly permuted in the total population 133
to generate a new dataset. The variance components and the F-statistics are calculated for 134
each permuted dataset to obtain their distributions under null hypothesis. The 135
probability that each permuted variance component or each F-statistic is greater than the 136
original value is used as a single-tailed P-value. 137
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 8: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/8.jpg)
8
Generalized framework 138
Here, we present a generalized framework to decompose the genetic variance. In our 139
framework, we do not need to use the degrees of freedom or the MS. Instead, we directly 140
use the variance components to express the expected SS of various hierarchies, whose 141
expressions are as follows: 142
E(ππWI) = πWI2 (π» β π),
E(ππWP) = πWI2 (π»β π) + πAI/WP
2 (π» βββπ£π 2
π£ππβππ
) ,
E(ππWG) = πWI2 (π» β πΊ) + πAI/WP
2 (π» βββπ£π2
π£ππβππ
) + πAP/WG2 (π»βββ
π£π2
π£ππβππ
) ,
E(ππTOT) = πWI2 (π» β 1) + πAI/WP
2 (π» ββπ£π2
π£π‘π
) + πAP/WG2 (π» ββ
π£π2
π£π‘π
) + πAG2 (π» ββ
π£π2
π£π‘π
) ,
(2)
where π» and π£π‘ are, respectively, the total number of haplotypes and alleles, π£π (π£π or 143
π£π) is the number of haplotypes/alleles within the individual π (the population π or the 144
group π), and the mobile subscript in β (β or β )ππ π is taken from all populations (all 145
groups or all individuals). The estimates of variance components are identical to those in 146
the classic framework. The step-by-step detailed derivations for these formulas in 147
Equation (2) are provided in the Supplementary materials. Compared with previous 148
frameworks (Table 1), Equation (2) is more regular, making it possible to be generalized. 149
According to the concept that each member in a hierarchy is a βvesselβ of genes, we 150
can use the vessels π0, π1, π2, π3, π4 and the corresponding expected SSπ (π = 1, 2, 3, 4) to 151
describe each formula in Equation (2). In other words, we can use these vessels to rewrite 152
Equation (2): 153
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 9: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/9.jpg)
9
E(SS1) = π12 (|π4| ββ β
|π0|2
|π1|π0βπ1π1
) ,
E(SS2) = π12 (|π4| ββ β
|π0|2
|π2|π0βπ2π2
) + π22 (|π4| ββ β
|π1|2
|π2|π1βπ2π2
) ,
E(SS3) = π12 (|π4| ββ β
|π0|2
|π3|π0βπ3π3
) + π22 (|π4| ββ β
|π1|2
|π3|π1βπ3π3
) + π32 (|π4| ββ β
|π2|2
|π3|π2βπ3π3
) ,
E(SS4) = π12 (|π4| ββ β
|π0|2
|π4|π0βπ4π4
) + π22 (|π4| ββ β
|π1|2
|π4|π1βπ4π4
) + π32 (|π4| ββ β
|π2|2
|π4|π2βπ4π4
)
+π42 (|π4| ββ β
|π3|2
|π4|π3βπ4π4
) ,
(3)
where ππ is a vessel at the level π, |ππ| is the number of allele copies in ππ, SSπ is the SS 154
within all vessels at the level π, ππ2 is the variance component among all ππβ1 within ππ, 155
and the mobile subscript in β ππ is taken from all vessels at the level π. The subscript π 156
ranges from 0 to 4, and the corresponding vessels represent, in turn, alleles, individuals, 157
populations, groups and the total population. Equation (3) is in apple-pie order, which 158
can be expressed as the forms of summation signs: 159
E(SSπ) =βππ2 (|π4| ββ β
|ππβ1|2
|ππ|ππβ1βππππ
)
π
π=1
, π = 1, 2, 3, 4.
If the hierarchy of individuals is ignored, then the vessel π1 (π2 or π3 ) will 160
represent a population (a group or the total population). In this situation, Equation (3) 161
becomes 162
E(SSπ) =βππ2 (|π3| ββ β
|ππβ1|2
|ππ|ππβ1βππππ
)
π
π=1
, π = 1, 2, 3.
Generally, if there are π + 1 kinds of vessels at the levels ranging from 0 to π, then 163
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 10: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/10.jpg)
10
Equation (3) can be generalized as follows: 164
E(SSπ) =βππ2 (|ππ| ββ β
|ππβ1|2
|ππ|ππβ1βππππ
)
π
π=1
, π = 1, 2,β― ,π, (4)
where ππ denotes the vessel of highest hierarchy (i.e., the total population). Equation (4) 165
is the ultimate generalized form of AMOVA, which is extremely simple and can be 166
applied to any number of hierarchies and any level of ploidy. We can also use matrices to 167
express Equation (4): 168
π = ππΊ,
where π = [E(ππ1), E(ππ2),β― , E(πππ)]π , πΊ = [π1
2, π22, β― , ππ
2 ]π and the coefficient matrix 169
π is lower-triangular with type π Γπ, whose ππth element is 170
πΆππ = {|ππ| ββ β
|ππβ1|2
|ππ|ππβ1βππππ
if π β₯ π,
0 if π < π,
π, π = 1, 2,β― ,π.
Then, a method-of-moment estimation of variance components can be given by οΏ½ΜοΏ½ = πβ1οΏ½ΜοΏ½, 171
and the F-statistics can be solved by 172
οΏ½ΜοΏ½ππ = 1 ββ οΏ½ΜοΏ½π
2ππ=1
β οΏ½ΜοΏ½π2π
π=1
, 1 β€ π < π β€ π. (5)
Method-of-moment methods 173
For convenience, we call a method-of-moment estimator a moment method. In practice, 174
the multilocus data are used to increase the accuracy of estimation. Based on the moment 175
estimator described above, we develop three methods (called the homoploid, anisoploid 176
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 11: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/11.jpg)
11
and weighting genotypic methods) to account for the multilocus genotypic or phenotypic 177
data. 178
The homoploid method is only applicable to homoploids. In this method, all loci are 179
treated as one dummy locus, and the dummy haplotypes are extracted from phenotypes. 180
Meanwhile, the genetic distance between any two dummy haplotypes is calculated, and 181
these dummy haplotypes are permuted to test the significance of differentiation. In 182
diploids, this method is used in GENALEX (PEAKALL AND SMOUSE 2006). 183
To solve the genotyping ambiguity, we will use the posterior probabilities to weight 184
the possible genotypes hidden behind a phenotype. The multiset consisting of alleles 185
within an individual and at a locus is defined as a genotype, denoted by π’, and the set 186
obtained by deleting the duplicated alleles in π’ is defined as the phenotype determined 187
by π’, denoted by π«. In our previous paper (HUANG et al. 2019), the genotypic frequency 188
Pr(π’) and the phenotypic frequency Pr(π«) under a double-reduction model (HWE, 189
RCS, CES or PES) were calculated. On this basis, we are able to calculate the posterior 190
probability Pr(π’|π«) of a genotype π’ determining π«, whose formula is as follows: 191
Pr(π’|π«) =Pr(π’)
Pr(π«).
After that, the probability Pr(π΄βπ = π΄ππ) (or πβππ for short) can be calculated by 192
πβππ = Pr(π΄βπ = π΄ππ) =βPr(π’|π«) Pr(π΄βπ = π΄ππ|π’)
π’
, (6)
where π΄βπ is the allele in the βth dummy haplotype and at the πth locus, π΄ππ is the πth 193
allele at the πth locus, and π’ is taken from all possible genotypes determining π«. 194
In the homoploid method, because all loci are treated as one dummy locus, the 195
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 12: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/12.jpg)
12
square of genetic distance πβββ² between the βth and the ββ²th
haplotype is the sum of 196
squares of the distances in the form ππ΄πππ΄ππβ² over all πΏ loci, namely 197
πβββ²2 =βππ΄βππ΄ββ²π
2
πΏ
π=1
=βββ πβπππββ²ππβ²ππ΄πππ΄ππβ²2
π½π
πβ²=1
π½π
π=1
πΏ
π=1
,
where ππ΄πππ΄ππβ² is the distance between the πth and the πβ²th
allele at the πth locus, and π½π 198
is the number of alleles at the πth locus. For an allele with missing data, its frequency 199
refers the frequency in the corresponding population. In this method, because there is 200
only one dummy locus, no additional weighting procedure is required for multilocus 201
data. 202
The anisoploid method can be applied for both homoploids and anisoploids. In this 203
method, the dummy alleles are extracted at each locus, and the missing data are ignored. 204
Meanwhile, the genetic distance between two dummy alleles needs to be calculated locus 205
by locus, and the dummy alleles are randomly permuted locus by locus during the 206
permutation test. 207
For this method, the probability Pr(π΄β = π΄π) (or πβπ for short) in a phenotype at a 208
target locus can also be expressed by Equation (6). Then the square of genetic distance 209
ππ΄βπ΄ββ² between two allele copies π΄β and π΄ββ² at this target locus is given by 210
ππ΄βπ΄ββ²2 =ββπβππββ²πβ²ππ΄ππ΄πβ²
2
π½
πβ²
π½
π
,
where π½ is the number of alleles at this target locus. 211
The untyped individuals (populations or groups) due to missing data should be 212
directly skipped to avoid a denominator of zero. The global variance components for 213
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 13: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/13.jpg)
13
multilocus data can be solved by using the formula π = ππΊ. There are two solving 214
strategies: (i) find the sum of the whole π and the sum of the whole π over all loci, and 215
then solve the global variance components, denoted by πΊg1; (ii) solve πΊ for each locus, 216
and then find the sum of whole πΊ over all loci, denoted by πΊg2. Generally, πΊg1 and πΊg2 217
are different, but they are approximately proportional to each other. 218
We adopt the first strategy because the global SS, the d.f. and the MS can also be 219
obtained. This strategy has the same output style as the classic framework. 220
In the weighting genotypic method, no dummy haplotypes are extracted. Instead, 221
for any hierarchy π, the SSπ for each genotype hidden behind a phenotype is calculated, 222
and then all sums of squares in the hierarchy π are weighted according to the 223
corresponding posterior probabilities. We also find the sum of those SSπ over all loci in 224
this method. For each locus, the SSπ is calculated by 225
SSπ =β1
|ππ|[βπ2(π«π)
πππ
π=1
+ β π2(π«π , π«π)
1β€π<πβ€πππ
] ,
ππ
where SS1 = SSWI (i.e., when π = 1), ππ is taken from all vessels in the hierarchy π, πππ 226
is the number of phenotypes determined by the individuals within ππ, and at this locus, 227
π2(π«π) (or π2(π«π , π«π)) is the weighted sum of squares of the distances within the 228
phenotype π«π (or between the phenotypes π«π and π«π), which can be calculated by the 229
following formulas: 230
π2(π«) =1
2β β Pr(π’|π«) ππ΄π΅
2
π΄,π΅βπ’
π’
,
π2(π«1, π«2) =ββ β β Pr(π’1|π«1) Pr(π’2|π«2) ππ΄π΅2
π΅βπ’2
π΄βπ’1π’2
π’1
,
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 14: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/14.jpg)
14
where π’ (π’1 or π’2) is taken from all candidate genotypes determining π« (π«1 or π«2), 231
and ππ΄π΅ is the distance between the alleles π΄ and π΅. 232
Maximum-likelihood method 233
We will develop a maximum-likelihood estimator to estimate the F-statistics and 234
solve the variance components. For convenience, we call this method the likelihood method. 235
In this method, a reversed procedure is used, such that the F-statistics are first estimated, 236
and next the variance components and other statistics are solved. 237
To derive the expression of the likelihood for individuals, we first model some 238
equations. A random distribution can be used to simulate the differentiation among 239
individuals within a vessel ππ . We will choose some Dirichlet distribution for each 240
hierarchy. That is because the standardized variance of each allele frequency accords 241
with the corresponding F-statistic in that distribution. Therefore, no additional weighting 242
procedure for the variance components is required. 243
Given a vessel ππ (2 β€ π β€ π ), the allele frequencies π11, π12, β― , π1π½ within an 244
individual (i.e., within one of those π1 ) are drawn from the Dirichlet distribution 245
π(πΌ1π1, πΌ1π2, β― , πΌ1ππ½), where π½ is the number of alleles within this individual, and 246
πΌ1ππ = (1/πΉ1π β 1)πππ , π = 1, 2,β― , π½,
in which πΉ1π is the F-statistic among all individuals within ππ, and πππ is the frequency 247
of πth allele in ππ. Then, the expectation and the variance of πth allele frequency π1π as 248
a random variable are, respectively, πππ and πΉ1ππππ(1 β πππ) , and the standardized 249
variance of πππ as a random variable is exactly πΉπ,π+1, which is identical to Wrightβs 250
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 15: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/15.jpg)
15
definition of F-statistics. 251
For simplicity, we let π©π be the vector consisting of the frequencies of all alleles in ππ, 252
i.e. 253
π©π = [ππ1, ππ2, β― , πππ½], π = 1, 2,β― ,π.
Then, for each π with 2 β€ π β€ π, the probability density function of π©1 is as follows: 254
π(π©1|π©π , πΉ1π) =β Ξ(πΌππ)π½π=1
Ξ (β πΌπππ½π=1 )
β π1π
πΌππβ1π½
π=1οΌ
where Ξ( β ) is the gamma function, and Ξ±ππ = 1/Fππ β 1. Assume that the alleles within 255
π1 are independently drawn according to the frequencies in π©1. Then, the allele copy 256
numbers of π1 obey a multinomial distribution, and so the frequency Pr(π’|π©1) of a 257
genotype π’ conditional on π©1 is 258
Pr(π’|π©1) = (|π1|
π1, π2, β¦ , ππ½)π11
π1π12π2 β¦π1π½
ππ½ ,
where ππ is the number of the πth allele copies in π’, π= 1, 2,β― , π½. 259
Now, the frequency Pr(π’|π©π , πΉ1π) of π’ conditional on both π©π and πΉ1π can be 260
obtained from the weighted average of Pr(π’|π©1) with π(π©1|π©π , πΉ1π)dπ©1 as the weight, 261
that is, 262
Pr(π’|π©π , πΉ1π) = β«Pr(π’|π©1) π(π©1|π©π , πΉ1π)dπ©1πΊ
, π = 2, 3,β― ,π,
where the integral domain πΊ can be expressed as 263
πΊ = {(π11, π12, β― , π1π½) | π11 + π12 +β―+ π1π½ = 1, π1π β₯ 0, π = 1, 2,β― , π½}.
The integral can be converted into the following repeated integral with the multiplicity 264
π½ β 1: 265
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 16: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/16.jpg)
16
Pr(π’|π©π , πΉ1π) = β« β« β¦β« Pr(π’|π©1)1βπ11βπ12ββ―βπ1,π½β2
0
1βπ11
0
1
0
π(π©1|π©π , πΉ1π)dπ11dπ12β¦dπ1,π½β1
= (|π1|
π1, π2, β¦ , ππ½)
Ξ(πΌ1π)
Ξ(|π1| + πΌ1π)β
Ξ(πΌ1ππ + ππ)
Ξ(πΌ1ππ)
π½
π=1
= (|π1|
π1, π2, β¦ , ππ½)β β (πΌ1ππ + π)
ππβ1
π=0
π½
π=1β (πΌ1π + π),
|π1|β1
π=0β (7)
where πΌ1π = 1/πΉ1π β 1. Importantly, πΌ1π β +β if πΉ1π β 0+, thus the variance π1π2 β 0 if 266
πΉ1π β 0+. Since π©π is unavailable, the estimate π©π is used as π©π in our calculation. 267
The frequency Pr(π«|π©π , πΉ1π) of a phenotype π« conditional on both π©π and πΉ1π is 268
the sum of frequencies in the form Pr(π’|π©π , πΉ1π), where π’ is taken from all candidate 269
genotypes determining π«, in other words, 270
Pr(π«|π©π , πΉ1π) =βPr(π’|π©π , πΉ1π)
π’
, π = 2, 3,β― ,π.
Now, the global likelihood for individuals at a hierarchy π can be obtained, which is 271
the product of frequencies in the form Pr(π«ππ|π©π , πΉ1π) over all individuals and at all loci, 272
symbolically 273
βπ =ββPr(π«ππ|π©π , πΉ1π)
π
π=1
πΏ
π=1
, π = 2, 3,β― ,π.
Because the allele frequencies are already estimated, a downhill simplex algorithm 274
(NELDER AND MEAD 1965) can be used to find the optimal πΉ1π under the IAM model. 275
After that, the variance components can be solved from the F-statistics with an additional 276
constraint as follows: 277
E(SSπ) =βππ2 (|ππ| β β
|ππβ1|2
|ππ|ππβ1βππ
)
π
π=1
,
where SSπ can be obtained from the allele frequencies of the total population under the 278
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 17: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/17.jpg)
17
IAM model, that is, 279
SSπ = |ππ| β ππππππππ΄ππ΄π2
1β€π<πβ€π½
,
where ππ΄ππ΄π is the IAM distance between the alleles π΄π and π΄π. 280
Differentiation test 281
In the homoploid/anisoploid method, the dummy haplotypes/alleles are extracted. 282
Then, the differentiation test can be performed by permuting the dummy 283
haplotypes/alleles. However, for the weighting genotypic and the likelihood methods, 284
this cannot be done because there are neither dummy haplotypes nor dummy alleles 285
being extracted in these two methods. 286
To solve this problem, we develop an alternative method to test the differences. In 287
this method, the datasets of the same structure as the original datasets are randomly 288
generated, where βthe same structureβ means that there are the same individuals, 289
populations and groups as well as the same missing data. More specifically, the 290
genotypes of each individual are generated conditional on π©π and πΉ1π according to 291
Equation (7) under the null hypothesis that there are no differences (i.e., πΉ1π β 0+). 292
Moreover, the phenotypes can be obtained by removing the duplicated alleles in the 293
generated genotypes. 294
For each generated dataset, the variance components and the F-statistics are 295
estimated by the same procedures as above to obtain their empirical distributions. 296
Similarly, the probability that each permuted variance component or each F-statistic is 297
greater than the original value is used as a single-tailed P-value. 298
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 18: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/18.jpg)
18
The authors affirm that all data necessary for confirming the conclusions of the 299
article are present within the article, figures, and tables. 300
Evaluations 301
Simulated data 302
A Monte-Carlo simulation is used to assess the accuracy of the four methods 303
mentioned above (three moment methods and one likelihood method) under different 304
conditions: ploidy level, number of hierarchies and population differentiations. We 305
choose three types of hierarchies: π = 3, 4 or 5. If we denote by ππ the number of 306
vessels in the form ππβ1 in ππ , then the ploidy level π1 of each individual (i.e., the 307
number of allele copies in each individual at a locus) is set as π1 = 2, 4 or 6, and the 308
number π2 of individuals sampled in each population ranges from 5 to 50 at intervals of 309
5. For those higher-hierarchy vessels, we set π3 = 4 and π4 = π5 = 2. In the following 310
discussion, we will use π to replace the symbol π2. Meanwhile, we set the number of 311
loci per population as 10 and set the number of alleles per locus π½ = 6. We simulate these 312
three types of hierarchies at each of the three ploidy levels in turn. 313
For the total population (i.e., ππ), the allele frequencies ππ1, ππ2, β― , πππ½ (π = 3, 4 314
or 5, π½ = 6) are randomly drawn from the Dirichlet distribution π(1, 1,β― , 1) with all 315
concentration parameters being equal to one. The F-statistic πΉπ,π+1 among all ππ within 316
ππ+1 is set as 0.05. To simulate the differentiation, the allele frequencies in π©π for each ππ 317
are independently generated according to both π©π+1 and πΉπ,π+1. More specifically, 318
ππ1, ππ2, β― , πππ½ are randomly drawn from the Dirichlet distribution π(ππ1, ππ2, β― , πππ½), π =319
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 19: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/19.jpg)
19
1, 2,β― ,π β 1, where 320
πππ = (1/πΉπ,π+1 β 1)ππ+1,π , π = 1, 2,β― , π½,
in which ππ+1,π is the frequency of πth allele in the upper vessel ππ+1. Obviously, each 321
πππ is proportional to ππ+1,π. 322
The alleles in each individual are randomly drawn according to the allele 323
frequencies π11, π12, β― , π1π½ for this individual (i.e., one of the vessels in the form π1). For 324
polyploids, the duplicated alleles within a genotype π’ will be removed to convert π’ 325
into a phenotype π«. The genotype π’ or the phenotype π« is randomly set as β at a 326
probability of 0.05 to simulate the negative amplification. 327
For any combination of simulation parameters, 5,000 datasets are generated, and 328
then the AMOVA for every generated dataset is performed by using each of these four 329
methods. The allele frequencies for each population at each locus are independently 330
estimated by using the double-reduction model under RCS with inbreeding. An 331
expectation-maximization algorithm modified from KALINOWSKI AND TAPER (2006) is 332
used to estimate the frequencies of alleles. In this algorithm, the initial value of each allele 333
frequency at a target locus is assigned as 1/π½, and then each frequency is iteratively 334
updated until the sequence consisting of those updated values is convergent. The 335
updated frequency οΏ½ΜοΏ½πβ² is calculated by 336
οΏ½ΜοΏ½πβ² =
β β Pr(π’|π«)π’ Pr(π΄π|π’)π«
β β Pr(π’|π«)π’π«, π = 1, 2,β― , π½,
where π« is taken from all phenotypes at this target locus, π’ is taken from all possible 337
genotypes determining π«, Pr(π’|π«) is the posterior probability of π’ determining π«, 338
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 20: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/20.jpg)
20
and Pr(π΄π|π’) is the frequency of the πth allele in π’ and at this target locus. 339
Because the dimension of variance components depends on the allele frequencies 340
and the number of loci, to estimate the F-statistics, what we truly need is the proportion 341
of β οΏ½ΜοΏ½π2π
π=1 to β οΏ½ΜοΏ½π2π
π=1 according Equation (5). Therefore, we use the bias and the RMSE 342
of the F-statistics to evaluate the accuracy of estimates of F-statistics, where RMSE is the 343
abbreviation of root-mean-square error. 344
Simulated results 345
The bias and the RMSE of οΏ½ΜοΏ½π,π+1 (1 β€ π < π) for diploids and under different 346
conditions are shown in Figures 1 and 2, respectively. It can be found from Figure 1 that 347
each bias is generally reduced as the sample size π increases, and its variation trend 348
does not change as the number π of hierarchies increases. The bias of οΏ½ΜοΏ½πβ1,π is smaller 349
than that of οΏ½ΜοΏ½π,π+1 (π β€ π β 2). For the anisoploid and the weighting genotypic methods, 350
the estimates of F-statistics are unbiased. However, for the homoploid method, they are 351
slightly biased due to the weighting for missing data. For the likelihood method, the bias 352
is largest, reaching 0.05, but it drops to below 0.015 at π = 50. As π increases, the bias 353
of οΏ½ΜοΏ½πβ1,π is within the range of the other F-statistics. 354
It can be seen from Figure 2 that for the three moment methods, the RMSEs of οΏ½ΜοΏ½π,π+1 355
are similar: if π is lower (e.g., π = 3), the variance decreases quickly as π increases. 356
However, if π is larger, the RMSE of οΏ½ΜοΏ½π,π+1 is less sensitive to the changes in π, and the 357
RMSE of οΏ½ΜοΏ½πβ1,π becomes more and more inaccurate as π increases. In contrast, for the 358
likelihood method, the RMSE of οΏ½ΜοΏ½π,π+1 is less affected by π, and the RMSE of οΏ½ΜοΏ½πβ1,π 359
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 21: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/21.jpg)
21
lies among those of οΏ½ΜοΏ½π,π+1 (π β€ π β 2). 360
The bias and the RMSE of οΏ½ΜοΏ½π,π+1 for tetraploids and under different conditions are 361
shown in Figures 3 and 4, respectively. It can be seen from Figure 3 that for the three 362
moment methods, the estimates of F-statistics become biased for the polyploid 363
phenotypic data. The bias of πΉ1,2 is largest, reaching β0.01 at π = 50 . For the 364
weighting genotypic method, the estimates of F-statistics are also biased, but their biases 365
drop to 0.003 at π = 50. For the likelihood method, the bias is larger than that of the 366
weighting genotypic method, reaching 0.02 at π = 50. As π increases, the bias of 367
οΏ½ΜοΏ½πβ1,π is also around those of the other F-statistics. 368
Compared with the situation of diploids, the RMSE in Figure 4 is reduced in scale, 369
while the patterns are similar to those in diploids. For the weighting genotypic method, 370
the RMSE of οΏ½ΜοΏ½π,π+1 becomes less sensitive to π as π increases, and the RMSE of οΏ½ΜοΏ½πβ1,π 371
is largest. For the likelihood method, the sensitivity of RMSE of οΏ½ΜοΏ½π,π+1 does not vary 372
significantly as π increases. 373
Empirical data 374
We will use the human dataset of PEMBERTON et al. (2013) to evaluate our 375
generalized framework of AMOVA. This dataset consists of 5795 individuals sampled 376
from 267 worldwide populations (e.g., ethnic groups). These populations are genotyped 377
at 645 autosomal microsatellite loci. The average genotyping rate is 97.02%. In this 378
dataset, the notion of groups needs to be divided into two levels, called groups I and 379
groups II, to generate a nested structure with five levels (individual, population, group I, 380
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 22: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/22.jpg)
22
group II, total population). 381
The collection consisting of several populations in some countries or areas is defined 382
as a group I, and the collection consisting of several populations in some region (e.g., East 383
Asia or Middle East) is defined as a group II. For example, the populations of all Chinese 384
nations are assigned to East Asia, whereas the population of the Uygur ethnic group is 385
originally in Central South Asia. We still stipulate that each population (or each group I) 386
can only belong to one group I (or one group II), and the union of all groups with the 387
same level is the total population. 388
Empirical results 389
Because PEMBERTON et al. (2013) dataset is genotypic and because 2.98% of 390
genotypes are missing, the weighting genotypic method is equivalent to the anisoploid 391
method, and the homoploid method is biased. We only use the anisoploid and the 392
likelihood methods for this dataset, whose results are shown in Table 2. Moreover, the 393
results of the corresponding F-statistics are shown in Table 3. 394
According to Tables 2 and 3, the two kinds of results obtained by using these two 395
methods are generally similar. The variance components within individuals contribute to 396
the majority in these two kinds of results and the F-statistics are generally small (below 397
0.08), implying a medium difference among populations (πΉππ β 0.06). For the anisoploid 398
method, the value of inbreeding coefficients is small (πΉπΌπ = 0.0119), but it is significantly 399
greater than zero, while it is exactly equal to zero for the likelihood method. 400
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 23: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/23.jpg)
23
Discussion 401
RMSE 402
In this paper, we generalize the framework of AMOVA and propose four methods 403
to solve the variance components and the F-statistics. 404
It can be found from the comparison of Figures 2 and 4 that the RMSE in diploids is 405
smaller than that in tetraploids, implying that the estimations of variance components 406
and F-statistics are more accurate for diploids, although there are some biases for the 407
polyploid phenotypic data. 408
We also see from Figures 2 and 4 that for the three moment methods, the estimated 409
F-statistic οΏ½ΜοΏ½πβ1,π becomes increasingly inaccurate as π increases. However, for the 410
likelihood method, as π increases, the accuracy of οΏ½ΜοΏ½πβ1,π is not only unaffected but 411
also the same as that of οΏ½ΜοΏ½π,π+1 (π β€ π β 2). Therefore, the likelihood method can be used 412
in the datasets with higher value of π. 413
Biasedness 414
For the homoploid method, the estimated F-statistic πΉπ,π+1 is biased for the 415
genotypic dataset with missing data. This bias is caused by the weighting for missing 416
data. The allele frequency of the missing genotypes refers to the allele frequency in the 417
corresponding population, and we assume that πΉπΌπ = 0 . Therefore, πΉ1,2 is 418
underestimated in Figure 1. For the anisoploid and the weighting genotypic methods, 419
because the missing genotype data are ignored, such a bias is avoided. 420
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 24: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/24.jpg)
24
For the phenotypic data, all three moment methods become biased. There are two 421
sources of these biases: (i) the extraction of dummy haplotypes; (ii) the estimation of 422
allele frequencies. 423
The extraction of dummy haplotypes breaks the correlation between alleles within 424
the same individual, which can bias the estimation of SSWI. Therefore, the bias of πΉ1,2 in 425
Figure 3 is largest. This bias can be reduced by increasing the sample size π (e.g., the 426
bias is β0.1 at π increasing to 50, see Figure 3). For the weighting genotypic method, 427
this bias can be eliminated. 428
For the genotypic data, the allele frequencies are estimated by counting the alleles 429
within the corresponding genotypes, so this estimation is unbiased. For the phenotypic 430
data, the allele frequencies are estimated by using an expectation-maximization 431
algorithm modified from KALINOWSKI AND TAPER (2006). Because this algorithm is also a 432
kind of maximum-likelihood method, such estimation is biased, and the bias is passed to 433
the subsequent steps. However, it can be reduced to a negligible level if π is large 434
enough (e.g., the bias is 0.003 at π = 50, see Figure 3). 435
Unbiasedness 436
Due to the unbiasedness of moment methods, some negative estimates of variance 437
components and F-statistics may present when the level of differentiation is low or the 438
sample size is small. We select three datasets to illustrate this phenomenon, where each 439
dataset consists of two populations with identical diploids which are genotyped at only 440
one biallelic locus. Specifically, in Dataset 1, each population contains four genotypes (1 441
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 25: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/25.jpg)
25
AA, 1 BB and 2 AB), which are drawn from HWE; in Dataset 2, each population is 442
heterozygote-deficient, only containing two homozygotes (1 AA and 1 BB); in Dataset 3, 443
each population is heterozygote-excessive, not containing homozygotes (2 AB). Because 444
there is only one locus and no missing data, the three moment methods are equivalent. 445
We use the homoploid method as an example, whose results with 9999 permutations are 446
shown in Table 4. The results by using the likelihood method are also shown in this table 447
as a comparison. 448
It can be found from Table 4 that for the moment methods, some estimates of 449
variance components and F-statistics are negative or greatly deviate from the true values. 450
For the likelihood method, such negative values can be avoided, and the values of 451
estimates can be ensured to lie in the range of biological meaning. 452
Empirical results 453
There are some differences in the results of AMOVA on PEMBERTON et al. (2013) 454
dataset between the moment and the likelihood methods (see Tables 2 and 3). For 455
example, the value of πΉπΌπ is significantly positive for the moment method, but it is 456
exactly equal to zero for the likelihood method. 457
The differences come from the dissimilarity between the schemes of these two 458
kinds of methods. For the moment method, the SS within each hierarchy and at each 459
locus is calculated, and the occurrence of some rare genotypes/phenotypes can only 460
slightly change the values of SS. Therefore, for the loci with a similar polymorphism, 461
their influences on the values of SS, MS, Var etc. are also similar. 462
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 26: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/26.jpg)
26
In contrast, for the likelihood method, the SS is more sensitive to the distribution 463
of genotypes/phenotypes, and the occurrence of some rare genotypes/phenotypes 464
(e.g., homozygotes of rare alleles) can greatly affect the values of SS. The influence of 465
a single rare genotype/phenotype may be equal to those of thousands of common 466
genotypes/phenotypes. For the loci with a similar polymorphism, their influences on 467
the values of SS, MS, Var etc. may be dramatic. 468
Applications 469
The calculating speed of the homoploid method is fastest during the permutation 470
test. For this method, the genetic distance matrix only needs to be calculated one time, 471
and it is permuted in a very fast way during the permutation test. More specifically, a 472
permutation π1π2β―ππ» of the number codes 1, 2,β― ,π» is randomly generated, where π» 473
is the order of the genetic distance matrix (i.e., the number of alleles). Let πππβ² be equal to 474
πππππ, where πππβ² is the ππth element in the permuted genetic distance matrix, and πππππ 475
is the ππππth element in the original distance matrix, π, π = 1, 2,β― ,π». This technique can 476
largely reduce the time expense, especially for a large dataset. For the other methods, the 477
genetic distance should be calculated at each locus and in each iteration, so the 478
calculating speeds of these methods are far slower than that of homoploid method. The 479
drawback of the homoploid method is that the genetic distances are biased for the 480
genotypic dataset with missing data or for the polyploid phenotypic data. Therefore, the 481
homoploid method is suitable for a high-quality genotypic data (with a high genotyping 482
rate) or a large dataset (e.g., next-generation sequencing data). 483
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 27: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/27.jpg)
27
Although the calculating speed of the anisoploid method is slower than that of the 484
homoploid method, it is still faster than those of the other two methods. That is because 485
the whole dataset needs to be regenerated during the permutation test in the other two 486
methods. For the anisoploid method, because the missing data are ignored during the 487
calculation, the genetic distances are unbiased for genotypic data with a low genotyping 488
rate. Therefore, this method is suitable for a low-quality genotypic data. 489
For the weighting genotypic method, because no dummy haplotypes are extracted, 490
the genetic distances are less biased for the phenotypic data. In this method, instead of 491
the use of permutation test, it randomly generates the dataset under the hypothesis that 492
there is no differentiation. After that, it also calculates the probability that the variance 493
components or the F-statistics at each locus and in each iteration are greater than the 494
observed values. Therefore, this method is suitable for the polyploid phenotypic data. 495
For the three moment methods, there are two problems: (i) the RMSE of each οΏ½ΜοΏ½πβ1,π 496
increases as the hierarchy number π increases, and (ii) some negative variance 497
components or some negative estimates of F-statistics may present when the difference 498
due to the unbiasedness is small. For the likelihood method, these two problems can be 499
avoided, and the RMSEs of the estimated F-statistics are insensitive to π. In addition, 500
various values of estimation are always in the biologically meaningful range. Therefore, 501
this method is suitable for a larger π and/or for datasets for which a part of the results 502
obtained by using these moment methods cannot be explained. 503
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 28: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/28.jpg)
28
Acknowledgment 504
KH would like to thank Prof. Kermit Ritland for providing the space of visiting 505
scholar in the University of British Columbia. This study was funded by the Strategic 506
Priority Research Program of the Chinese Academy of Sciences (XDB31020302), the 507
National Natural Science Foundation of China (31770411, 31730104, 31572278 and 508
31770425), the Young Elite Scientists Sponsorship Program by CAST (2017QNRC001), 509
and the National Key Programme of Research and Development, Ministry of Science and 510
Technology (2016YFC0503200). DWD is supported by a Shaanxi Province Talents 100 511
Fellowship. 512
References 513
Avni, R., M. Nave, O. Barad, K. Baruch, S. O. Twardziok et al., 2017 Wild emmer genome 514
architecture and diversity elucidate wheat evolution and domestication. Science 357: 515
93-97. 516
Burow, M. D., C. E. Simpson, J. L. Starr and A. H. Paterson, 2001 Transmission genetics of 517
chromatin from a synthetic amphidiploid to cultivated peanut (Arachis hypogaea L.): 518
broadening the gene pool of a monophyletic polyploid species. Genetics 159: 823-837. 519
Chambers, J. L., and D. Garant, 2010 Determinants of population genetic structure in eastern 520
chipmunks (Tamias striatus): the role of landscape barriers and sex-biased dispersal. 521
Journal of Heredity 101: 413-422. 522
Clark, L. V., and M. Jasieniuk, 2011 POLYSAT: an R package for polyploid microsatellite 523
analysis. Molecular Ecology Resources 11: 562-566. 524
Cockerham, C. C., 1969 Variance of gene frequencies. Evolution 23: 72-84. 525
Cockerham, C. C., 1973 Analyses of gene frequencies. Genetics 74: 679-700. 526
Excoffier, L., and H. E. Lischer, 2010 Arlequin suite ver 3.5: a new series of programs to 527
perform population genetics analyses under Linux and Windows. Molecular Ecology 528
Resources 10: 564-567. 529
Excoffier, L., P. E. Smouse and J. M. Quattro, 1992 Analysis of molecular variance inferred 530
from metric distances among DNA haplotypes: application to human mitochondrial 531
DNA restriction data. Genetics 131: 479-491. 532
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 29: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/29.jpg)
29
Geffen, E. L. I., M. J. Anderson and R. K. Wayne, 2004 Climate and habitat barriers to 533
dispersal in the highly mobile grey wolf. Molecular Ecology 13: 2481-2490. 534
Haldane, J. B., 1930 Theoretical genetics of autopolyploids. Journal of Genetics 22: 359-372. 535
Hardy, O. J., and X. Vekemans, 2002 SPAGeDi: a versatile computer program to analyse 536
spatial genetic structure at the individual or population levels. Molecular Ecology Notes 537
2: 618-620. 538
Huang, K., K. Ritland, S. Guo, M. Shattuck and B. Li, 2014 A pairwise relatedness estimator 539
for polyploids. Molecular Ecology Resources 14: 734-744. 540
Huang, K., T. C. Wang, D. W. Dunn, P. Zhang, R. C. Liu et al., 2019 Genotypic frequencies at 541
equilibrium for polysomic inheritance under double-reduction. G3: Genes, Genomes, 542
Genetics: doi: 10.1534/g1533.1119.400132. 543
Kalinowski, S. T., and M. L. Taper, 2006 Maximum likelihood estimation of the frequency of 544
null alleles at microsatellite loci. Conservation Genetics 7: 991-995. 545
Lait, L. A., and T. M. Burg, 2013 When east meets west: population structure of a high-latitude 546
resident species, the boreal chickadee (Poecile hudsonicus). Heredity 111: 321-329. 547
Lande, R., 1976 Natural selection and random genetic drift in phenotypic evolution. Evolution: 548
314-334. 549
Ling, H. Q., B. Ma, X. L. Shi, H. Liu, L. L. Dong et al., 2018 Genome sequence of the progenitor 550
of wheat A subgenome Triticum urartu. Nature 557: 424. 551
Luppi, T. A., E. D. Spivak and C. C. Bas, 2003 The effects of temperature and salinity on larval 552
development of Armases rubripes Rathbun, 1897 (Brachyura, Grapsoidea, Sesarmidae), 553
and the southern limit of its geographical distribution. Estuarine, Coastal and Shelf 554
Science 58: 575-585. 555
Martin, N. H., and J. H. Willis, 2007 Ecological divergence associated with mating system 556
causes nearly complete reproductive isolation between sympatric Mimulus species. 557
Evolution 61: 68-82. 558
Mather, K., 1935 Reductional and equational separation of the chromosomes in bivalents and 559
multivalents. Journal of Genetics 30: 53-78. 560
Meirmans, P. G., and S. Liu, 2018 Analysis of Molecular Variance (AMOVA) for 561
autopolyploids. Frontiers in Ecology and Evolution 6: 66. 562
Meirmans, P. G., and P. H. V. Tienderen, 2004 GENOTYPE and GENODIVE : two programs 563
for the analysis of genetic diversity of asexual organisms. Molecular Ecology Notes 4: 564
792β794. 565
Nelder, J. A., and R. Mead, 1965 A simplex method for function minimization. The computer 566
journal 7: 308-313. 567
Otto, S. P., 2007 The evolutionary consequences of polyploidy. Cell 131: 452-462. 568
Peakall, R., and P. E. Smouse, 2006 GENALEX 6: genetic analysis in Excel. Population genetic 569
software for teaching and research. Molecular Ecology Notes 6: 288-295. 570
Peakall, R., P. E. Smouse and D. Huff, 1995 Evolutionary implications of allozyme and RAPD 571
variation in diploid populations of dioecious buffalograss Buchloe dactyloides. 572
Molecular Ecology 4: 135-148. 573
Pemberton, T. J., M. DeGiorgio and N. A. Rosenberg, 2013 Population structure in a 574
comprehensive genomic data set on human microsatellite variation. G3: Genes, Genomes, 575
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 30: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/30.jpg)
30
Genetics 3: 891-907. 576
Pritchard, J. K., M. Stephens and P. Donnelly, 2000 Inference of population structure using 577
multilocus genotype data. Genetics 155: 945-959. 578
Russell, E. M., Y. Yom-Tov and E. Geffen, 2004 Extended parental care and delayed dispersal: 579
northern, tropical, and southern passerines compared. Behavioral Ecology 15: 831-838. 580
Slatkin, M., 1995 A measure of population subdivision based on microsatellite allele 581
frequencies. Genetics 139: 457-462. 582
Wright, S., 1943 Isolation by distance. Genetics 28: 114. 583
Yang, J. Y., S. A. Cushman, J. Yang, M. B. Yang and T. J. Bao, 2014 Effects of climatic gradients 584
on genetic differentiation of Caragana on the Ordos Plateau, China. Landscape Ecology 585
28: 1729-1741. 586
587
Author contributions 588
KH and BGL designed the project, KH and YLL constructed the model and wrote the 589
draft, KH designed the software, PZ performed the simulations and analyses, and DWD 590
checked the model and helped to write the manuscript. 591
592
Tables 593
Table 1. The layout of AMOVA. The total SS is decomposed into the SS in different 594
sources of variation. Each expected MS is expressed here as a function of variance 595
components. 596
Source of variation d.f. SS MS Expected MS
Within individual π SSWI SSWI πΊ β 1
πWI2
Among individuals
within populations π β π SSWP β SSWI
SSWP β SSWI πΊ β 1
2πAI/WP2 + πWI
2
Among populations
within groups π β πΊ SSWG β SSWP
SSWG β SSWPπΊ β 1
ππAP/WG
2 + 2πAI/WP2
+ πWI2
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 31: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/31.jpg)
31
Among groups πΊ β 1 SSTOT β SSWG SSTOT β SSWG
πΊ β 1 πβ²β²πAG
2 + πβ²πAP/WG2
+2πAI/WP2 + πWI
2
Here, π , π or πΊ denotes the number of individuals, populations or groups, 597
respectively, and SSWI, SSWP, SSWG or SSTOT denotes the SS within individuals, within 598
populations, within groups or in the total population, which can be obtained by Equation 599
(1). The coefficients π, πβ² and πβ²β² are, respectively, calculated by 600
π =2πβ2β β ππ
2/πππβππ
πβπΊ, πβ² =
2β β ππ2/πππβππ β2β ππ
2/ππ
πΊβ1 and πβ²β² =
2πβ2β ππ2/ππ
πΊβ1, 601
in which ππ or ππ is the number of individuals in the population π or in the group π, 602
respectively, and the mobile subscript in β (or in β )ππ is taken from all groups (or all 603
populations) in the total population. 604
605
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 32: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/32.jpg)
32
606
Table 2. The degrees of freedom, sum of squares (SS), mean squares (MS), estimated 607
variance components (Var) and the variance percentage of (microsatellite, SLATKIN 1995) 608
dataset 609
Method Source d.f. SS MS Var % P
Perm.
Mean
Perm.
Var
An
iso
plo
id
WI 3626266 1293746.5 0.357 0.357 92.68 1.000 0.381 0.000
AI/WP 3454743 1262341.6 0.365 0.004 1.12 0.000 0.000 0.000
AP/WC1 133491 87578.6 0.656 0.007 1.82 0.000 0.000 0.000
AC1/WC2 32227 36005.9 1.117 0.003 0.77 0.000 0.000 0.000
AC2 5160 86911.1 16.843 0.014 3.62 0.000 0.000 0.000
Total 7251887 2766583.6 0.381 0.385 100.00 - - -
Lik
elih
oo
d
WI 3626266 1310274.0 0.361 0.361 94.00 1.000 0.381 0.000
AI/WP 3454743 1248297.8 0.361 0.000 0.00 0.624 0.000 0.000
AP/WC1 133491 88346.1 0.662 0.007 1.88 0.000 0.000 0.000
AC1/WC2 32227 45688.1 1.418 0.005 1.26 0.000 0.000 0.000
AC2 5160 73977.7 14.337 0.011 2.86 0.000 0.000 0.000
Total 7251887 2766583.6 0.381 0.384 100.00 - - -
*WI: within individuals; AI/WP: among individuals within populations; AP/WC1: 610
among populations within groups I; AC1/WC2: among groups I within groups II; 611
AC2: among groups II. 612
613
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 33: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/33.jpg)
33
614
Table 3. The value, significance, permuted mean and permuted variance of F-statistics 615
F-statistics
Anisoploid likelihood
Value P
Mean
Γ 10β6
Var
Γ 10β8
Value P
Mean
Γ 10β4
Var
Γ 10β6
πΉπΌπ 0.0119 <0.0001 0.0425 7.3783 0.0000 0.1649 0.0000 0.0000
πΉπΌπΆ1 0.0307 <0.0001 1.1174 7.2052 0.0196 <0.0001 1.7837 0.3049
πΉπΌπΆ2 0.0384 <0.0001 1.2765 7.1643 0.0323 <0.0001 3.9708 0.5911
πΉπΌπ 0.0732 <0.0001 1.3279 7.1507 0.0600 <0.0001 6.1550 1.0658
πΉππΆ1 0.0190 <0.0001 1.0729 0.2298 0.0196 <0.0001 1.7837 0.3049
πΉππΆ2 0.0268 <0.0001 1.2320 0.1936 0.0323 <0.0001 3.9708 0.5911
πΉππ 0.0620 <0.0001 1.2835 0.1751 0.0600 <0.0001 6.1550 1.0658
πΉπΆ1πΆ2 0.0080 <0.0001 0.1584 0.0812 0.0130 <0.0001 2.1884 0.0948
πΉπΆ1π 0.0438 <0.0001 0.2100 0.0516 0.0412 <0.0001 4.3740 0.3788
πΉπΆ2π 0.0362 <0.0001 0.0513 0.0156 0.0286 <0.0001 2.1870 0.0947
616
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 34: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/34.jpg)
34
Table 4. Results of AMOVA by using the moment and likelihood methods 617
Estimator Dataset Source d.f. SS MS Var %
Perm.
Mean
Perm.
Var P F-statistics Value
Perm.
Mean
Perm.
Var P
1 WI 8 2.000 0.250 0.250 100.0 0.267 0.008 0.302 πΉπΌπ 0.143 -0.023 0.143 0.169
AI/WP 6 2.000 0.333 0.042 16.7 -0.001 0.010 0.168 πΉπΌπ 0.000 -0.016 0.140 0.507
AP 1 0.000 0.000 -0.042 -16.7 0.000 0.003 0.763 πΉππ -0.167 -0.012 0.028 0.763
Total 15 4.000 0.267 0.250 100.0 - - -
MOM 2 WI 4 0.000 0.000 0.000 0.0 0.287 0.019 0.913 πΉπΌπ 1.000 -0.074 0.341 0.000
AI/WP 2 2.000 1.000 0.500 200.0 -0.003 0.032 0.000 πΉπΌπ 1.000 -0.058 0.337 0.000
AP 1 0.000 0.000 -0.250 -100.0 0.001 0.016 0.943 πΉππ -1.000 -0.048 0.158 0.943
Total 7 2.000 0.286 0.250 100.0 - - -
3 WI 4 2.000 0.500 0.500 200.0 0.286 0.019 0.000 πΉπΌπ -1.000 -0.067 0.345 0.766
AI/WP 2 0.000 0.000 -0.250 -100.0 0.001 0.033 0.766 πΉπΌπ -1.000 -0.054 0.339 0.766
AP 1 0.000 0.000 0.000 0.0 -0.001 0.016 0.476 πΉππ 0.000 -0.057 0.162 0.476
Total 7 2.000 0.286 0.250 100.0 - - -
1 WI 8 2.133 0.267 0.267 100.0 0.239 0.002 0.644 πΉπΌπ 0.000 0.091 0.032 0.304
AI/WP 6 1.600 0.267 0.000 0.0 0.024 0.002 0.304 πΉπΌπ 0.000 0.116 0.038 0.358
AP 1 0.267 0.267 0.000 0.0 0.009 0.001 0.511 πΉππ 0.000 0.031 0.005 0.376
Total 15 4.000 0.267 0.267 100.0 - - -
ML 2 WI 4 0.000 0.000 0.000 0.0 0.247 0.006 0.949 πΉπΌπ 1.000 0.113 0.068 0.000
AI/WP 2 1.333 0.667 0.333 100.0 0.033 0.007 0.000 πΉπΌπ 1.000 0.151 0.078 0.000
AP 1 0.667 0.667 0.000 0.0 0.018 0.003 0.569 πΉππ 0.000 0.049 0.019 0.321
Total 7 2.000 0.286 0.333 100.0 - - -
3 WI 4 1.143 0.286 0.286 100.0 0.247 0.006 0.000 πΉπΌπ 0.000 0.113 0.068 0.256
AI/WP 2 0.571 0.286 0.000 0.0 0.033 0.007 0.256 πΉπΌπ 0.000 0.151 0.078 0.444
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 35: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/35.jpg)
35
AP 1 0.286 0.286 0.000 0.0 0.018 0.003 0.569 πΉππ 0.000 0.049 0.019 0.321
Total 7 2.000 0.286 0.286 100.0 - - -
* Dataset 1: pop1 = pop2 = {π΄π΄, π΅π΅, π΄π΅, π΄π΅}; Dataset 2: pop1 = pop2 = {π΄π΄, π΅π΅}; Dataset 3: pop1 = pop2 = {π΄π΅, π΄π΅}. 618
619
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 36: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/36.jpg)
36
Figure legends 620
Figure 1. The bias of estimated F-statistic οΏ½ΜοΏ½π,π+1 for diploids as a function of sample 621
size π sampled from each population under different conditions (1 β€ π < π). For each 622
of the four methods listed at the top, the results are shown in the column where this 623
method is located. Each row shows the results of a structure, where the structure 624
β2xNx4x2β means that π = 4, and there are two groups, each group containing four 625
populations and each population consisting of π diploids. The meanings of other two 626
structures can be analogized. Each solid, dashed, dash-dotted or dotted line denotes the 627
bias of οΏ½ΜοΏ½1,2 οΏ½ΜοΏ½2,3, οΏ½ΜοΏ½3,4 or οΏ½ΜοΏ½4,5, respectively, corresponding to the value of π. 628
Figure 2. The RMSE of estimated F-statistic οΏ½ΜοΏ½π,π+1 for diploids and under different 629
conditions (1 β€ π < π). The meanings of columns and rows are as indicated in Figure 1. 630
Each solid, dashed, dash-dotted or dotted line denotes the RMSE of οΏ½ΜοΏ½1,2, οΏ½ΜοΏ½2,3, οΏ½ΜοΏ½3,4 or 631
οΏ½ΜοΏ½4,5, respectively, corresponding to the value of π. 632
Figure 3. The bias of estimated F-statistic οΏ½ΜοΏ½π,π+1 for tetraploids and under different 633
conditions. The meanings of columns, rows and lines are as indicated in Figure 1. 634
Figure 4. The RMSE of estimated F-statistic οΏ½ΜοΏ½π,π+1 for tetraploids and under different 635
conditions. The meanings of columns, rows and lines are as indicated in Figure 2. 636
637
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 37: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/37.jpg)
37
Figures 638
639
Figure 1 640
641
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 38: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/38.jpg)
38
Figure 2 642
643
Figure 3 644
645
Figure 4 646
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 39: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/39.jpg)
39
Supplementary materials 647
Linear model 648
Let π΄ be an allele randomly taken from the total population, and let π be the mean 649
frequency of π΄ in the total population. We will focus on the biases of the frequency π 650
related with an allele π to carry out our discussion. Our linear model is developed from 651
COCKERHAM (1969; 1973), which is described by the following function: 652
π¦ππππ = π + ππ + πππ + ππππ + πππππ,
where π is arbitrary, and the relations among π, π, π, π are nested, that is, π β π β π β π; 653
ππ is the bias of the frequency π in the group π relative to the total population, πππ is 654
the bias of π in the population π relative to the group π, ππππ is the bias of π in the 655
individual π relative to the population π, and πππππ is the bias of π in the allele π 656
relative to the individual π. It is worth pointing out that because of the nested relation, π 657
and π are uniquely determined so long as π is given. We stipulate that the condition 658
E(yππππ) = π should be satisfied in this model. 659
Because E(yππππ) = π and the allele frequencies obey a binomial distribution, we 660
have var(π¦ππππ) = π(1 β π), that is, πTOT2 = π(1 β π). 661
According to COCKERHAM (1969; 1973), the formulas for cov(π¦ππππ, π¦πβ²πβ²πβ²πβ²) under 662
various situations are 663
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 40: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/40.jpg)
40
cov(π¦ππππ, π¦πβ²πβ²πβ²πβ²) =
{
var(π¦ππππ) = π(1 β π) if π = πβ², π = πβ², π = πβ², π = πβ²;
covAA/WI = πΉπΌππ(1 β π) if π = πβ², π = πβ², π = πβ², π β πβ²;
covAI/WP = πΉπππ(1 β π) if π = πβ², π = πβ², π β πβ²;
covAP/WG = πΉπΆππ(1 β π) if π = πβ², π β πβ²;
covAG = 0 if π β πβ².
For the final situation, because the alleles among groups are assumed to be independent, 664
the value of corresponding F-statistic is zero, and so covAG = 0π(1 β π) = 0. Moreover, 665
the formulas for E(π¦πππππ¦πβ²πβ²πβ²πβ²) under various situations are 666
E(π¦πππππ¦πβ²πβ²πβ²πβ²) =
{
π(1 β π) + π2 if π = πβ², π = πβ², π = πβ², π = πβ²;
πΉπΌππ(1 β π) + π2 if π = πβ², π = πβ², π = πβ², π β πβ²;
πΉπππ(1 β π) + π2 if π = πβ², π = πβ², π β πβ²;
πΉπΆππ(1 β π) + π2 if π = πβ², π β πβ²;
0 + π2 if π β πβ².
In fact, for the first situation, E(π¦πππππ¦πβ²πβ²πβ²πβ²) becomes E(π¦ππππ2 ), so 667
E(π¦πππππ¦πβ²πβ²πβ²πβ²) = var(π¦ππππ) + [E(π¦ππππ)]2= π(1 β π) + π2.
For the second situation, E(π¦πππππ¦πβ²πβ²πβ²πβ²) becomes E(π¦πππππ¦ππππβ²). Because πΉπΌπ is the 668
probability Pr(π¦ππππ β‘ π¦ππππβ²) in the sense that two distinct alleles within a same 669
individual in the total population are IBD, we obtain 670
E(π¦πππππ¦πβ²πβ²πβ²πβ²) = Pr(π¦πππππ¦ππππβ² = 1)
= Pr(π¦ππππ = 1, π¦ππππβ² = 1)
= Pr(π¦ππππ β‘ π¦ππππβ²)Pr(π¦ππππ = 1)
+[1 β Pr(π¦ππππ β‘ π¦ππππβ²)]Pr(π¦ππππ = 1)Pr(π¦ππππβ² = 1)
= πΉπΌππ + (1 β πΉπΌπ)π2
= πΉπΌππ(1 β π) + π2.
For the remaining situations, the derivations are similar, and omitted. 671
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 41: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/41.jpg)
41
There are the following relations between the F-statistics and the variance 672
components: 673
πΉπΌπ = 1 βπWI2
πTOT2 ,
πΉππ = 1 βπWI2 + πAI/WP
2
πTOT2 ,
πΉπΆπ = 1 βπWI2 + πAI/WP
2 + πAP/WG2
πTOT2 ,
πTOT2 = πAI/WP
2 + πAP/WG2 + πAG
2 .
Because πTOT2 = π(1 β π), we obtain 674
πWI2 = (1 β πΉπΌπ)π(1 β π),
πAI/WP2 = (πΉπΌπ β πΉππ)π(1 β π), (8)
πAP/WG2 = (πΉππ β πΉπΆπ)π(1 β π),
πAG2 = πΉπΆππ(1 β π).
We will use the symbol οΏ½Μ οΏ½πππ (οΏ½Μ οΏ½ππ, οΏ½Μ οΏ½π or οΏ½Μ οΏ½π‘) to denote the average of values of the 675
function π¦ππππ when π is taken from all alleles within the individual π (the population 676
π, the group π or the total population). Then, 677
οΏ½Μ οΏ½πππ =1
π£πβπ¦πππππβπ
, οΏ½Μ οΏ½ππ =1
π£πβπ¦πππππβπ
=1
π£πββπ¦ππππ
πβππβπ
,
οΏ½Μ οΏ½π =1
π£πβπ¦πππππβπ
=1
π£πββπ¦ππππ
πβππβπ
=1
π£πββπ¦ππππ
πβππβπ
=1
π£πβββπ¦ππππ
πβππβππβπ
,
οΏ½Μ οΏ½π‘ =1
π£π‘βπ¦πππππ
=1
π£π‘ββββπ¦ππππ
πβππβππβππ
,
where π£π, π£π, π£π or π£π‘ is the number of alleles within the individual π, the population 678
π, the group π or the total population, respectively. 679
Derivation for the formula of expected ππππ 680
The expectations of οΏ½Μ οΏ½πππ2 and π¦πππποΏ½Μ οΏ½πππ are calculated as follows: 681
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 42: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/42.jpg)
42
E(οΏ½Μ οΏ½πππ2 ) = var(οΏ½Μ οΏ½πππ) + [E(οΏ½Μ οΏ½πππ)]
2= var (
1
π£πβπ¦πππππβπ
) + π2
=1
π£π2
[
βvar(π¦ππππ)
πβπ
+ β cov(π¦ππππ, π¦ππππβ²)
π,πβ²βππβ πβ² ]
+ π2
=1
π£π2[π£ππ(1 β π) + (π£π
2 β π£π)πΉπΌππ(1 β π)] + π2
=1
π£ππ(1 β π) +
π£π β 1
π£ππΉπΌππ(1 β π) + π
2,
E(π¦πππποΏ½Μ οΏ½πππ) = E(1
π£πβπ¦πππππ¦ππππβ²
πβ²βπ
)
=1
π£π[ E(π¦ππππ
2 ) + β E(π¦πππππ¦ππππβ²)
πβ²βππβ²β π ]
=1
π£π[π(1 β π) + π2] +
π£π β 1
π£π[πΉπΌππ(1 β π) + π
2]
=1
π£ππ(1 β π) +
π£π β 1
π£ππΉπΌππ(1 β π) + π
2.
Comparing the two calculated results, we have E(οΏ½Μ οΏ½πππ2 ) = E(π¦πππποΏ½Μ οΏ½πππ). In addition, 682
πWI2 = (1 β πΉπΌπ)π(1 β π).
Using these facts, we can easily obtain the next result: 683
E [(π¦ππππ β οΏ½Μ οΏ½πππ)2] = (1 β
1
π£π) πWI
2 .
Moreover, we have 684
β(1 β1
π£π)
π
= π» βββ1
π£ππβππ
= π» ββπ£ππ£π
π
= π» β π,
where π» and π are the numbers of alleles and individuals in the total population, 685
respectively. Now, we can easily derive the first formula of expected SS in Equation (2): 686
E(SSWI) = E [β(π¦ππππ β οΏ½Μ οΏ½πππ)2
π
] =βE[(π¦ππππ β οΏ½Μ οΏ½πππ)2]
π
=β(1 β1
π£π) πWI
2
π
= πWI2 (π» β π).
Derivation for the formula of expected πΊπΊππ 687
The expectations of οΏ½Μ οΏ½ππ2 and π¦πππποΏ½Μ οΏ½ππ are calculated as follows: 688
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 43: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/43.jpg)
43
E(οΏ½Μ οΏ½ππ2 ) = var(οΏ½Μ οΏ½ππ) + [E(οΏ½Μ οΏ½ππ)]
2= var(
1
π£πβπ¦πππππβπ
) + π2
=1
π£π2
[
βvar(π¦ππππ)
πβπ
+β β cov(π¦ππππ, π¦ππππβ²)
π,πβ²βππβ πβ²
πβπ
+ β β cov(π¦ππππ, π¦πππβ²πβ²)πβππβ²βπβ²
π,πβ²βπ
πβ πβ² ]
+ π2
=1
π£ππ(1 β π) +
β π£πβ²2
πβ²βπ β π£π
π£π2
πΉπΌππ(1 β π) +π£π2 β β π£πβ²
2πβ²βπ
π£π2
πΉπππ(1 β π) + π2,
E(π¦πππποΏ½Μ οΏ½ππ) = E(1
π£πβ β π¦πππππ¦πππβ²πβ²
πβ²βπβ² πβ²βπ
)
=1
π£π[
E(π¦ππππ2 ) + β E(π¦πππππ¦ππππβ²)
πβ²βππβ²β π
+β β E(π¦πππππ¦πππβ²πβ²)
πβ²βπβ²πβ²βπ
πβ²β π ]
=1
π£ππ(1 β π) +
π£π β 1
π£ππΉπΌππ(1 β π) +
π£π β π£π
π£ππΉπππ(1 β π) + π
2.
Now, by using Equation (8), it is not difficult to calculate that 689
E [(π¦ππππ β οΏ½Μ οΏ½ππ)2] = πWI
2 (1 β1
π£π) + πAI/WP
2 (1 +β π£πβ²
2πβ²βπ
π£π2
β2π£ππ£π).
Moreover, because π is the number of populations in the total population, we have 690
β(1β1
π£π)
π
= π» βββ1
π£ππβππ
= π» ββ1
π
= π» β π,
β(1+β π£πβ²
2πβ²βπ
π£π2
β2π£ππ£π)
π
= π» +βπ£π β π£πβ²
2πβ²βπ
π£π2
π
βββ2π£π
2
π£ππβππ
= π» βββπ£π2
π£ππβππ
,
Now, the second formula of expected SS in Equation (2) can be derived as follows: 691
E(ππWP) = E [β(π¦ππππ β οΏ½Μ οΏ½ππ)2
π
] =βE[(π¦ππππ β οΏ½Μ οΏ½ππ)2]
πβπ
=β[πWI2 (1 β
1
π£π) + πAI/WP
2 (1 +β π£πβ²
2πβ²βπ
π£π2
β2π£ππ£π)]
πβπ
= πWI2 (π» β π) + πAI/WP
2 (π» βββπ£π2
π£ππβππ
) .
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 44: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/44.jpg)
44
Derivation for the formula of expected ππππ 692
The expectations of οΏ½Μ οΏ½π2 and π¦πππποΏ½Μ οΏ½π are calculated as follows: 693
E(οΏ½Μ οΏ½π2) = var(οΏ½Μ οΏ½π) + [E(οΏ½Μ οΏ½π)]
2= var(
1
π£πβπ¦πππππβπ
) + π2
=1
π£π2
[
βvar(π¦ππππ)
πβπ
+β β cov(π¦ππππ, π¦ππππβ²)
π,πβ²βππβ πβ²
πβπ
+β β β cov(π¦ππππ, π¦πππβ²πβ²)πβππβ²βπβ²
π,πβ²βπ
πβ πβ²πβπ
+ β β cov(π¦ππππ, π¦ππβ²πβ²πβ²)πβπ
πβ²βπβ²π,πβ²βπ
πβ πβ² ]
+ π2
=1
π£ππ(1 β π) +
β π£πβ²2
πβ²βπ β π£ππ£π2
πΉπΌππ(1 β π)
+β π£πβ²
2πβ²βπ β β π£πβ²
2πβ²βπ
π£π2
πΉπππ(1 β π) +π£π2 β β π£πβ²
2πβ²βπ
π£π2
πΉπΆππ(1 β π) + π2,
E(π¦πππποΏ½Μ οΏ½π) = E(1
π£πβ β β π¦πππππ¦ππβ²πβ²πβ²
πβ²βπβ²πβ²βπβ²πβ²βπ
)
=1
π£π[
E(π¦ππππ2 ) + β E(π¦πππππ¦ππππβ²)
πβ²βππβ²β π
+β β E(π¦πππππ¦πππβ²πβ²)
πβ²βπβ²πβ²βπ
πβ²β π
+ β β E(π¦πππππ¦ππβ²πβ²πβ²)
πβ²βπβ²πβ²βπ
πβ²β π ]
=1
π£ππ(1 β π) +
π£π β 1
π£ππΉπΌππ(1 β π) +
π£π β π£π
π£ππΉπππ(1 β π)
+π£π β π£π
π£ππΉπΆππ(1 β π) + π
2.
Now, by using Equation (8), it is easy to calculate that 694
E [(π¦ππππ β οΏ½Μ οΏ½π)2] = πWI
2 (1 β1
π£π) + πAI/WP
2 (1 +β π£πβ²
2πβ²βπ
π£π2
β2π£ππ£π)
+πAP/WG2 (1 +
β π£πβ²2
πβ²βπ
π£π2
β2π£π
π£π) .
Moreover, because πΊ is the number of all groups, if we let π₯ = π or π₯ = π, then 695
β(1β1
π£π) = π» β
π
ββ1
π£ππβππ
= π» ββ1
π
= π» β πΊ,
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 45: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/45.jpg)
45
β(1+β π£π₯β²
2π₯β²βπ
π£π2
β2π£π₯π£π)
π
= π» +βπ£π β π£π₯β²
2π₯β²βπ
π£π2
π
β 2ββπ£π₯2
π£ππ₯βππ
= π» βββπ£π₯2
π£ππ₯βππ
.
Now, the third formula of expected SS in Equation (2) can be derived as follows: 696
E(ππWG) = E [β(π¦ππππ β οΏ½Μ οΏ½π)2
π
] =βE[(π¦ππππ β οΏ½Μ οΏ½π)2]
π
=β[πWI2 (1 β
1
π£π) + πAI/WP
2 (1 +β π£πβ²
2πβ²βπ
π£π2
β2π£ππ£π)
π
+πAP/WG2 (1 +
β π£πβ²2
πβ²βπ
π£π2
β2π£π
π£π)]
= πWI2 (π» β πΊ) + πAI/WP
2 (π» βββπ£π2
π£ππβππ
) + πAP/WG2 (π» βββ
π£π2
π£ππβππ
) .
Derivation for the formula of expected πππππ 697
The expectations of οΏ½Μ οΏ½π‘2 and π¦πππποΏ½Μ οΏ½π‘ are calculated as follows: 698
E(οΏ½Μ οΏ½π‘2) = var(οΏ½Μ οΏ½π‘) + [E(οΏ½Μ οΏ½π‘)]
2 = var (1
π£π‘βπ¦πππππ
) + π2
=1
π£π‘2
[
βvar(π¦ππππ)
π
+β β cov(π¦ππππ, π¦ππππβ²)
π,πβ²βππβ πβ²
π
+β β β cov(π¦ππππ, π¦πππβ²πβ²)πβππβ²βπβ²
π,πβ²βπ
πβ πβ²π
+β β β cov(π¦ππππ, π¦ππβ²πβ²πβ²)πβπ
πβ²βπβ²π,πβ²βπ
πβ πβ²π
+ β β cov(π¦ππππ, π¦πβ²πβ²πβ²πβ²)πβπ
πβ²βπβ²π,πβ²
πβ πβ² ]
+ π2
=1
π£π‘π(1 β π) +
β π£πβ²2
πβ² β π£π‘
π£π‘2 πΉπΌππ(1 β π) +
β π£πβ²2
πβ² β β π£πβ²2
πβ²
π£π‘2 πΉπππ(1 β π)
+β π£πβ²
2πβ² β β π£πβ²
2πβ²
π£π‘2 πΉπΆππ(1 β π) +
π£π‘2 β β π£πβ²
2πβ²
π£π‘2 Γ 0 + π2,
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint
![Page 46: A generalized framework of AMOVA with any number of ...The analysis of molecular variance (AMOVA) is a widely used . statistical model in 23 the studies of population genetics and](https://reader033.vdocuments.site/reader033/viewer/2022050106/5f44499636d71a6869529413/html5/thumbnails/46.jpg)
46
E(π¦πππποΏ½Μ οΏ½π‘) = E(1
π£π‘ββ β β π¦πππππ¦πβ²πβ²πβ²πβ²
πβ²βπβ² πβ²βππβ²βππβ²
)
=1
π£π‘[
E(π¦ππππ2 ) + β E(π¦πππππ¦ππππβ²)
πβ²βππβ²β π
+β β E(π¦πππππ¦πππβ²πβ²)
πβ²βπβ²πβ²βπ
πβ²β π
+ β β E(π¦πππππ¦ππβ²πβ²πβ²)
πβ²βπβ²πβ²βπ
πβ²β π
+ β β E(π¦πππππ¦πβ²πβ²πβ²πβ²)
πβ²βππβ²
πβ²β π ]
=1
π£π‘π(1 β π) +
π£π β 1
π£π‘πΉπΌππ(1 β π) +
π£π β π£π
π£π‘πΉπππ(1 β π)
+π£π β π£π
π£π‘πΉπΆππ(1 β π) +
π£π‘ β π£π
π£π‘Γ 0 + π2.
Now, according to Equation (8), we can easily calculate that 699
E [(π¦ππππ β οΏ½Μ οΏ½π‘)2] = πWI
2 (1 β1
π£π‘) + πAI/WP
2 (1 +β π£πβ²
2πβ²
π£π‘2 β
2π£ππ£π‘)
+πAP/WG2 (1 +
β π£πβ²2
πβ²
π£π‘2 β
2π£π
π£π‘) + πAG
2 (1 +β π£πβ²
2πβ²
π£π‘2 β
2π£ππ£π‘) .
Moreover, we have β (1 β1
π£π‘)π = π» β 1 and 700
β(1+β π£π₯β²
2π₯β²
π£π‘2 β
2π£π₯π£π‘)
π
= π» +β π£π₯β²
2π₯β²
π£π‘β 2β
π£π₯2
π£π‘π₯
= π» ββπ£π₯2
π£π‘π₯
, π₯ = π, π, π.
Now, let us derive the final formula in Equation (2): 701
E(SSTOT) = E [β(π¦ππππ β οΏ½Μ οΏ½π‘)2
π
] =βE[(π¦ππππ β οΏ½Μ οΏ½π‘)2]
π
=β[πWI2 (1 β
1
π£π‘) + πAI/WP
2 (1 +β π£πβ²
2πβ²
π£π‘2 β
2π£ππ£π‘)
π
+πAP/WG2 (1 +
β π£πβ²2
πβ²
π£π‘2 β
2π£π
π£π‘) + πAG
2 (1 +β π£πβ²
2πβ²
π£π‘2 β
2π£ππ£π‘)]
= πWI2 (π» β 1) + πAI/WP
2 (π» ββπ£π2
π£π‘π
) + πAP/WG2 (π» ββ
π£π2
π£π‘π
) + πAG2 (π» ββ
π£π2
π£π‘π
) .
.CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted April 13, 2019. . https://doi.org/10.1101/608117doi: bioRxiv preprint