an empirical verification of population assignment methods ......29 assignment tests are...

33
1 An empirical verification of population assignment methods by marking and parentage 1 data: hatchery and wild steelhead (Oncorhynchus mykiss) in Forks Creek, Washington, 2 USA. 3 4 L. HAUSER, T. R. SEAMONS, M. DAUER, K.A. NAISH, T.P. QUINN 5 6 School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98105-5020, USA 7 8 9 10 Keywords: Assignment tests, hatchery releases, parental identification, salmonids, microsatellites 11 12 13 14 Corresponding Author: 15 Lorenz Hauser 16 School of Aquatic and Fishery Sciences 17 University of Washington 18 Seattle, WA 98105-5020 19 USA 20 e-mail: [email protected] 21 Phone: (206) 685 3270 22 Fax: (206) 685 6651 23 24 25 Running Title: Empirical comparison of assignment methods 26

Upload: others

Post on 21-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

1

An empirical verification of population assignment methods by marking and parentage 1 data: hatchery and wild steelhead (Oncorhynchus mykiss) in Forks Creek, Washington, 2

USA. 3 4

L. HAUSER, T. R. SEAMONS, M. DAUER, K.A. NAISH, T.P. QUINN 5 6 School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98105-5020, USA 7 8 9 10 Keywords: Assignment tests, hatchery releases, parental identification, salmonids, microsatellites 11 12 13 14 Corresponding Author: 15 Lorenz Hauser 16 School of Aquatic and Fishery Sciences 17 University of Washington 18 Seattle, WA 98105-5020 19 USA 20 e-mail: [email protected] 21 Phone: (206) 685 3270 22 Fax: (206) 685 6651 23 24 25 Running Title: Empirical comparison of assignment methods26

Page 2: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

2

Abstract 27 28 Assignment tests are increasingly applied in ecology and conservation, though empirical 29 comparisons of methods are still rare or are restricted to few of the available approaches. 30 Furthermore, the performance of assignment tests in cases with low population differentiation, 31 violations of Hardy-Weinberg equilibrium and unbalanced sampling designs has not been 32 verified. The release of adult hatchery steelhead to spawn in Forks Creek in 1996 and 1997 33 provided an opportunity to compare the power of different assignment methods to distinguish 34 their offspring from those of sympatric wild steelhead. We compared standard assignment 35 methods requiring baseline samples (frequency, distance and Bayesian) and clustering approaches 36 with and without baseline information, using six freely available computer programs. 37 Assignments were verified by parentage data obtained for a subset of returning offspring. All 38 methods provided similar assignment success, despite low differentiation between wild and 39 hatchery fish (FST=0.02). Bayesian approaches with baseline data performed best, while the 40 results of clustering methods were variable and depended on the samples included in the analysis 41 and the availability of baseline information. Removal of a locus with null alleles and equalizing 42 sample sizes had little effect on assignments. Our results demonstrate the robustness of most 43 assignment tests to low differentiation and violations of assumptions, as well as their utility for 44 ecological studies that require distinguishing different groups. 45 46 47

Page 3: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

3

Introduction 48

The application of molecular genetic markers to problems in ecology and evolution has 49 revolutionized our understanding of the living world. Identification of isolated populations, 50 estimation of genetic differentiation and inbreeding and reconstruction of phylogenetic 51 relationships has dominated such applications for decades. With the development of variable 52 markers (e.g. microsatellites) and more powerful analytical methods, however, applications have 53 expanded from population genetic models under equilibrium expectations to applications that are 54 more relevant on ecological time scales (Hansen et al., 2001a; Manel et al., 2005). Furthermore, 55 these advances have shifted the focus from populations to individuals, and it is now possible to 56 identify the genetic origin of specific organisms, with applications in the estimation of current 57 migration rates (Wilson & Rannala, 2003; Paetkau et al., 2004), identification of immigrants 58 (Rannala & Mountain, 1997), forensic identification of the origin of animal remains (Wasser et 59 al., 2004), and the occurrence of hybridization (Randi et al., 2001). There is even some evidence 60 that departure of assignment success from random expectations may be a more sensitive test for 61 population differentiation than traditional tests based on allele frequencies and FST values 62 (Waples & Gaggiotti, 2005). In addition, recently developed statistical methods remove the 63 requirement for known allele frequencies in source populations, thus allowing separation of 64 mixed samples into contributing constituents (Pritchard et al., 2000; Wilson & Rannala, 2003). It 65 is thus not surprising that the number of molecular genetic studies applying assignment tests 66 increases rapidly. 67

Despite this wide use of assignment tests, their performance in “real world” examples is 68 still somewhat uncertain. Simulation studies confirm intuitive expectations that assignment 69 success increases with (i) the number of loci, (ii) the number of alleles, (iii) levels of genetic 70 differentiation (Cornuet et al., 1999; Guinand et al., 2004), and (iv) samples sizes from 71 contributing populations (Cornuet et al., 1999). However, other studies suggest that there is an 72 upper limit of useful allelic diversity (Bernatchez & Duchesne, 2000), and that subsets of loci can 73 be identified that have similar assigning power as the entire set of loci ((Bernatchez & Duchesne, 74 2000; Banks et al., 2003). Levels of differentiation usually required for reasonable assignment 75 success are usually cited as FST values of 0.05 – 0.1 (Cornuet & Luikart, 1996; Manel et al., 2002; 76 Berry et al., 2004), depending on the mutation model and the assignment method used. Sample 77 sizes required decrease with increasing differentiation (Cornuet et al., 1999), although the effect 78 of unequal sample sizes is largely unknown. 79

Assignment tests can be separated into methods that require allele frequency data of 80 ‘pure’ contributing populations, and those that do not. The former category can be further divided 81 in Bayesian, frequency and distance based methods (see Cornuet et al., 1999 for a review). 82 Briefly, frequency based methods (Paetkau et al., 1995) assign an individual to the population to 83 which it has the highest likelihood of belonging, given its multilocus genotype and the allele 84 frequencies of contributing populations. Bayesian tests (Rannala & Mountain, 1997) are similar 85 to frequency based methods, though likelihoods are computed within a Bayesian framework. 86 Distance based methods can be based on a variety of different distance methods and assign 87 individuals to the population to which they have the smallest genetic distance. In their 88 comparison, (Cornuet et al., 1999) concluded that Bayesian and frequency methods are more 89 powerful than distance based methods, though they suggest that distance based methods may be 90

Page 4: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

4

less sensitive to violations of population genetic expectations such as Hardy Weinberg and 91 linkage equilibrium. 92

The second category, methods that do not require baseline data, have attracted much 93 interest by ecologists, because collection of baseline data usually requires considerable sampling 94 effort, the estimation of allele frequency always incurs a sampling error and affects the 95 assignment (Guinand et al., 2004), and the effect of unsampled populations is difficult to assess 96 (Manel et al., 2005). One of these methods is a Bayesian clustering method, which separates 97 individuals in a mixture into groups that are in linkage and Hardy-Weinberg equilibrium 98 (Pritchard et al., 2000). An added benefit of the method is that the number of contributing 99 populations can be estimated by comparing the likelihood of different numbers of populations 100 given the data. An alternative method is a Bayesian partition method using an analytical 101 integration strategy and stochastic search, which is very much faster than Bayesian clustering 102 methods based on Monte Carlo Markhov Chain methods (Corander et al., 2004). 103

Because these methods have the advantage of not requiring baseline data, there has been 104 much interest in their power relative to more traditional approaches. In a study reanalyzing ten 105 published data sets from a wide range of taxonomic groups (mammals, fish and insects), Manel et 106 al. (2002) showed that the Bayesian clustering method performed even better than the partially 107 Bayesian approach of Rannala and Mountain (1997) requiring a baseline dataset. However, by 108 only accepting assignments when either all alternative populations could be excluded (Bayesian 109 baseline) or when assignment probability was higher than 99.9%, their approach was more 110 stringent than that generally used for ecological work. Furthermore, in their study, FST values of 111 all data sets were fairly high (FST = 0.1 – 0.4), and in one dataset, assignment success dropped 112 rapidly once FST between pairs of populations dropped below 0.05. In a similar study on skinks, 113 and using a less stringent approach, Berry et al. (2004) found that both clustering and Bayesian 114 assignment with baseline reached close to 100% assignment accuracy at FST values above 0.06 to 115 0.08. Although these studies established that the power of Bayesian clustering may be equal or 116 higher than that of traditional approaches, the performance of tests when differentiation is low is 117 unknown. 118

A problem usually inherent in assignment tests is their verification. In contrast to 119 simulation studies, the true population membership of individuals is not known, and therefore the 120 genetic assignment has to be verified in some way. Most studies use the leave-one-out method, 121 though problems with estimation of baseline allele frequencies (Guinand et al., 2004), and the 122 potential for genetic drift between the collection of baseline and mixed samples may inflate the 123 self-assignment success compared to true assignment in mixed samples. With methods not 124 requiring baseline samples such verification is only possible if some known samples exist, thus 125 eliminating the prime advantage of those methods. It is thus very advantageous if information on 126 true population membership can be obtained from other sources. Most commonly, such 127 information stems from the locality where individuals were caught (Manel et al., 2002; Berry et 128 al., 2004) though migration between contributing population and inclusion of immigrants in 129 baselines samples may induce significant error. Other methods of verification include the analysis 130 of growth patterns from scales (Bartron et al., 2004) or comparison with mark-recapture data 131 (Berry et al., 2004). Another, yet unexplored, option is the comparison with parentage data 132 presented here. 133

Page 5: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

5

Assignment tests have proved particularly relevant to conservation- or harvest-based 134 applications in the management of fish populations. Specifically, these tests may be used to 135 evaluate the effectiveness of artificial rearing programs that either support threatened fish 136 populations or enhance fishery opportunities (Hansen et al., 2001b; Hansen et al., 2001c). 137 Several authors have warned of the potential for negative genetic impacts of hatchery-origin fish 138 on wild populations (Hindar et al., 1991; Waples, 1991; Utter & Epifanio, 2002). Fishery 139 enhancement hatcheries releasing anadromous steelhead (Oncorhynchus mykiss) in the Pacific 140 Northwest have deliberately sought to segregate fish that are intended for harvest from wild 141 populations (Mackey et al., 2001 and references therein). A few hatchery broodstocks have been 142 selected for early return timing, and, released in many Washington rivers, these steelhead can be 143 taken by fisheries at a time when few wild fish are available. In this paper, we took advantage of 144 an unusual opportunity presented when a hatchery began to propagate steelhead of such non-local 145 strain in a river that had not previously had a steelhead hatchery. All first-generation hatchery-146 produced smolts (yearling migrants) were distinguishable from wild fish by removal of the 147 adipose fin, but the offspring of naturally-spawning hatchery fish could not be distinguished from 148 wild fish except by genetic techniques. 149

The aims of our study were twofold: first, to assign unmarked returning steelhead to either 150 wild or hatchery ancestry, and to compare six different methods requiring baselines and two 151 clustering methods with and without the use of baseline data. Assignment success could be 152 verified by identification of parents (known to be of wild or hatchery origin) of some of the 153 returning offspring using genetic parentage methods. Second, we investigated the effect of null 154 alleles and differences in sample size on population assignment because most methods assume 155 Hardy Weinberg equilibrium within populations and may be affected by biases in sample sizes. 156

157

Materials and Methods 158

Study population 159

Forks Creek, a tributary of the Willapa River in southwest Washington, USA, has a small 160 wild population of steelhead that generally enters from March through May and spawns from 161 April through June (Mackey et al., 2001). A hatchery, situated just above the creek’s confluence 162 with the Willapa River, has been operated by the Washington Department of Fish and Wildlife to 163 produce coho and chinook salmon since 1895. A weir across Forks Creek blocks migrating adult 164 salmon and steelhead and provides hatchery staff with access to returning adults. This weir 165 functions well except under very high water conditions. There had been occasional releases of 166 hatchery-produced steelhead in the Willapa River but apparently not in Forks Creek. Beginning 167 in the winter of 1995-96 (here, referred to as brood year (BY) 1996), the creek received the first 168 adult returns from hatchery-produced steelhead released as smolts in spring of 1994. These fish 169 originated from the Chambers Creek hatchery population but had been propagated at the 170 Bogachiel Hatchery (farther north along the coast of Washington), mixed with unknown 171 proportions of Bogachiel River steelhead. All fish released from the hatchery were marked by 172 removal of their adipose fin (‘clipped’), and were thus identifiable as returning adults. The 173 Chambers Creek population is widely released in Washington, and has been artificially selected 174 for early return and spawning timing (generally December through February) to minimize fishery 175

Page 6: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

6

conflicts and interbreeding with wild fish, and to facilitate culture operations. There were 176 additional plants from the Bogachiel Hatchery in 1996-97 (brood year BY 1997, Table 1) but 177 since then the Forks Creek hatchery has produced its own fish without external contribution. 178

In the first two years (BY 1996 and 1997), hatchery adults were released above the weir 179 once hatchery requirements were met. This practice was then discontinued, and only unmarked 180 (‘unclipped’) fish were transferred across the weir in subsequent years. Forks Creek therefore 181 received a pulse of hatchery influence over two years, prompting a long term and ongoing study 182 on the effects of hatcheries on wild steelhead populations (Mackey et al., 2001; McLean et al., 183 2005). 184

Sampling 185

Hatchery fish, identified by a clipped adipose fin, were caught at the weir starting in 186 November each year. These fish were used for hatchery spawning and killed (samples 96ck – 187 01ck, for BY 1996 – 2001 respectively, Table 1). As mentioned above, some clipped fish were 188 released above the weir in 1996 and 1997 (samples 96cr and 97cr). Wild (unclipped) fish 189 (samples 96u – 01u) returned later, and were also collected at the weir between April and June. 190 These fish were generally not used for spawning at the hatchery. Because flooding sometimes 191 occurred, some fish passed the weir without being sampled, and were only caught on their return 192 downstream journey or not at all. All fish caught at the weir were measured for length, weighed, 193 and a fin clip was taken for genetic analysis. Some wild fish were caught both on their upstream 194 and their return downstream journey – in this case, only the upstream sample was used for 195 analysis of run timing. 196

Molecular Analyses 197

DNA extraction, microsatellite PCR and genotyping followed protocols outlined in 198 (Seamons et al., 2004). Briefly, DNA was extracted from caudal fin clippings using DNAeasy 199 columns (Qiagen). Eight microsatellite loci were amplified (Oki23, Spidle, unpublished, 200 GenBank acc. # AF272822; Omy1001UW, Omy1011UW, Omy1191UW, Omy1212UW,(Spies et 201 al., 2005); Omy77, (Morris et al., 1996); One108, (Olsen et al., 2000b); Ssa85, (O'Reilly et al., 202 1996)). PCR amplifications were carried out in 10µl using 55 oC (Oki23, Omy77, One108, 203 Omy1001UW, Omy1011UW,), 60 oC (Ssa85) and 65oC (Omy1191UW, Omy1212UW) annealing 204 temperature and a MgCl2 concentration of 2 (Ssa85) or 1 (all others) µM. Amplified products 205 were size separated on a 96 capillary automated sequencer (MegaBace1000, Amersham 206 Biosciences). Allele size was estimated using Genetic Profiler v2.0 (Amersham Biosciences). 207

Statistical analysis 208

Fish collected in 1996 – 1998 were used as baseline, and divided into potential source 209 populations based on their adipose fin (clipped, unclipped), brood year (BY 1996, 1997, 1998) 210 and whether they were killed or released above the weir. We therefore had eight baseline 211 populations: 96u, 97u, 98u, 96cr, 97cr, 96ck, 97ck and 98ck, where u stands for ‘unclipped’, c for 212 ‘clipped’, r for ‘released’ and k for ‘ killed’. Five unclipped fish in 1996 and three unclipped fish 213 in 1997 were excluded from the baseline, because they returned very early and in preliminary 214 trials were consistently assigned to the hatchery population. Because brood years were genetically 215

Page 7: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

7

differentiated and varied in their deviation from equilibrium expectations (see results), they were 216 considered independent baseline populations. Although killed and released clipped fish came 217 from the same population, they were treated separately for two reasons. First, splitting hatchery 218 samples reduced the bias in sample size between potential source populations and second, only 219 released clipped fish could be the parents of 2000 and 2001 unclipped adults. Furthermore, the 220 released fish came primarily from the later portion of the run, and may thus represent a biased 221 sample of the population. However, to investigate the effect of differences in sample sizes 222 between baseline populations, we also tested baselines where clipped released and clipped killed 223 fish were combined, and where all populations were split into similarly-sized samples of 25-33 224 fish (the smallest wild sample). Both unclipped and clipped fish from BY 1999 – 2001 were used 225 as unknown samples – clipped fish were likely from the hatchery, but unclipped fish could be 226 offspring of either wild or released hatchery fish. For the final analysis, only assignment to any 227 wild or any hatchery population was considered, by pooling individuals assigned to 96u, 97u, 98u 228 as ‘wild’ and 96cr, 97cr, 96ck, 97ck and 98ck as ‘hatchery’. 229

FIS values were calculated and deviations from Hardy Weinberg were tested using 230 GENEPOP v 3.4 (Raymond & Rousset, 1995). FIS values, observed and expected heterozygosity 231 and allelic diversity were calculated in FSTAT (Goudet, 1995); the same program was used to 232 estimate and test (1000 permutations) FST values between clipped and unclipped fish and between 233 brood years . As an alternative measure of differentiation, Shriver et al. (1997) indices were 234 calculated as the sum of allele frequency differences between populations averaged over loci. 235 Euclidean Cavalli-Sforza distances (Cavalli-Sforza & Edwards, 1967) were estimated using 236 GENETIX (Belkhir et al., 2004) and used for graphic analysis in Multidimensional Scaling Plots. 237 An AMOVA (Analysis of Molecular Variance) was used to partition differentiation into 238 components due to variation among brood years within stock component (hatchery or wild) and 239 between stock components (ARLEQUIN v.3.0, Excoffier et al., 2005). Linkage disequilibrium was 240 estimated by rD, which is a standardized measure of the index of association, IA (Brown et al., 241 1980), calculated and tested with 100 permutations and excluding individuals with missing data 242 using the program MULTILOCUS (Agapow & Burt, 2001). 243

Parental identification 244

Parentage analysis was carried out based on exclusion. WHICHPARENT (Eichert & 245 Hedgecock, Bodega Bay Marine Lab, California, unpublished, available at 246 http://www.bml.ucdavis.edu/whichparents.html) was used to identify candidate parents that 247 matched offspring at one allele at 6 of 8 loci to allow for genotyping errors and non-Mendelian 248 inheritance of alleles (e.g., null alleles). The genotypes of adults identified by WHICHPARENT as 249 potential parents were then directly compared to offspring genotypes. After checking both adult 250 and adult offspring genotypes for errors, only adults that matched one allele at all loci were 251 assigned as parents. Because of the possibility of repeat spawning, the lack of age data of 252 returning offspring and incomplete sampling of both parents and offspring in some years, adults 253 from all years previous to the return year of adult offspring were considered potential parents. No 254 assignment was possible for some offspring, probably because of incomplete sampling of parents, 255 or, in the case of especially large adult offspring, because they were born before this study was 256 initiated. 257

Page 8: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

8

In some cases only a single parent was assigned to an offspring. If the offspring was a 258 hatchery fish (i.e., clipped adipose fin) its parentage was accepted only when it was known that 259 an adult of the opposite sex that was spawned on the same day as the assigned parent had no 260 genotype available. If the offspring was a wild-spawned fish (i.e., intact adipose fin) single 261 parent parentage was accepted if the parent was either an unclipped fish, or a clipped fish that had 262 been released above the weir (i.e., naturally spawning). The probability of randomly obtaining 263 false single parent matches was calculated over all loci using CERVUS (Marshall et al., 1998); the 264 maximum probability of a false match was estimated from the frequency of the most common 265 allele following Seamons et al. (2004). 266

Assignment tests 267

Individual fish were assigned to brood years of hatchery or wild fish using five methods 268 (Bayesian, frequency based, and distance based assignment with a baseline, and a clustering and 269 Bayesian detection method not requiring a baseline) and five computer programs (WHICHRUN 270 (Banks & Eichert, 2000), GENECLASS2 (Piry et al., 2004), GMA (Kalinowski, ST. 2003. Genetic 271 Mixture Analysis 1.0. Department of Ecology, Montana State University, Bozeman MT 59717. 272 Available for download from http://www.montana.edu/kalinowski), BAYES (Pella & Masuda, 273 2001), STRUCTURE (Pritchard et al., 2000), and BAPS 3.1 (Corander et al., 2004). Individuals were 274 either assigned to the most probable population (‘standard assignment’), or an estimator of 275 confidence was used to assign only some individuals and exclude individuals of uncertain origin 276 (‘confident assignment’). For the methods requiring baselines, we carried out both self-277 assignment (i.e. assignment of 1996-1998 individuals to 1996-1998 samples), which was checked 278 against adipose condition (clipped vs. unclipped), and assignment of mixed samples (1999-2001), 279 which was verified by adipose condition and parentage using Pivot Tables in Microsoft Excel. 280 For STRUCTURE and BAPS, we performed one analysis using baseline information (USE 281 POPINFO command in STRUCTURE, and trained clustering in BAPS) and one analysis with all 282 samples (1996-2001) but without baseline information. In addition, we submitted all samples and 283 all unclipped samples from 1999-2001 to separate analyses using both STRUCTURE and BAPS 284 without the use of a baseline. 285

Bayesian assignments followed the methods by Rannala & Mountain (1997) and were 286 carried out in GMA and GENECLASS2. The two programs follow similar approaches, with two 287 major differences: first, during self assignment, GENECLASS2 leaves the individual to be assigned 288 out of the baseline (leave-one-out procedure), whereas GMA does not. Second, the prior 289 probabilities of an individual belonging to a population are equal in GENECLASS2, whereas GMA 290 uses a Mixed Stock Analysis (Pella et al., 1996) to estimate prior probabilities. GMA is therefore 291 expected to overestimate correct assignments in self-assignment tests, but may perform better in 292 assigning individuals from mixtures. GMA provides probabilities for each individual of belonging 293 to each population – for the ‘confident estimate’ we therefore used only assignments with a 294 probability of more than 0.95. These probabilities can also be used for reporting units, in this 295 case, hatchery (96cr, 96ck, 97cr, 97ck, 98ck) or wild (96u, 97u, 98u). GENECLASS2 provides 296 similar probabilities as scores of the likelihood of the respective populations over all likelihoods. 297 Here we summed scores for wild and hatchery samples, and used only assignments of a score of 298 more than 0.95 for the confident assignment. 299

Page 9: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

9

A slightly different approach, albeit also based on Bayesian statistics, is used by BAYES 300 (Pella & Masuda, 2001). Gibbs sampling of Markhov Chains is used to generate samples of the 301 posterior probability distributions of stock proportions and allele frequencies in the contributing 302 stocks. Mixture individuals are stochastically assigned to the baseline samples with probabilities 303 equal to the current posterior source probabilities. Finally, each mixture individual is assigned to 304 the source sample whose average posterior source probability is a maximum – where source 305 probabilities exceeded 95%, assignment was considered ‘confident’. Three Markhov chains with 306 a length of 500 iterations were run, one starting from equal priors, another from 90% hatchery 307 contribution and another from 90% wild contribution. Chain length was sufficient as determined 308 by a Raftery & Lewis (1996) diagnostic. Convergence of the three chains was tested by a Gelman 309 & Rubin (1992) statistic using sample groups (hatchery vs. wild) rather than individual samples 310 estimates. BAYES excludes all individuals in the mixture with alleles that are not found in any 311 baseline populations – here, 14 fish from 1999-2001 were excluded. Individual assignments were 312 obtained as the sum of iterations after an initial burn-in of 250 steps. For self-assignment, 1996-313 1998 were used both as baseline and mixture file, that is, similar to GMA no leave-one-out 314 procedure was employed. 315

Frequency based assignments are based on the likelihood methods by Paetkau et al. 316 (1995) and were carried out both with WHICHRUN and GENECLASS2. The primary difference 317 between the two methods is the treatment of alleles that are found in the mixture, but not in the 318 baseline samples. WHICHRUN adds such alleles at a frequency of 1/(2N+1) to each sample (where 319 N is the sample size), while GENECLASS2 allows a specification of the frequency in which such 320 alleles should be added to the baseline (here 0.01). To exclude fish of uncertain origin in the 321 results from both programs, we transformed the likelihoods into a probability (Piry et al., 2004), 322 and only included individuals where the summed probability of either hatchery or wild samples 323 exceeded 0.95. 324

Although distance based methods, which assign individuals to the population to which 325 they have the smallest genetic distance, are thought to be inferior to Bayesian and frequency 326 methods (Cornuet et al., 1999), we used Cavalli-Sforza & Edwards (1967) distances to assign 327 individuals to populations, again using GENECLASS2. For the confident assignment, we calculated 328 a ratio of the minimum distance over the average distance to the alternative group, that is, all 329 hatchery samples for fish assigned to the wild, or all wild samples for fish assigned to the 330 hatchery. Only individuals where this ratio was less than 0.95 were confidently assigned. 331

A novel method of clustering genotypes was used to separate hatchery and wild fish in 332 each year without using a baseline (Pritchard et al., 2000). The method is based on the 333 assumption of Hardy-Weinberg and linkage equilibrium in each population, and clustering occurs 334 iteratively in a Gibbs sampling of a Monte Carlo Markhov Chain (MCMC) by separating 335 individuals in the mixed sample until this assumption is fulfilled. STRUCTURE also offers the 336 possibility of estimating the likelihood of different numbers of contributing populations. As our 337 steelhead were either hatchery or wild, we fixed that number to two populations. Although 338 hybridization between the two stocks was likely limited, we used an admixture model in our 339 analyses. Confident assignment was possible in individuals with higher than 95% probability of 340 belonging to either wild or hatchery population. 341

Page 10: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

10

In contrast to STRUCTURE, BAPS does not use MCMC, but a stochastic optimization to 342 infer the posterior mode of the genetic structure (Corander et al., in press). The algorithm is much 343 faster than MCMC, but so far direct comparisons between the two approaches are limited. Here, 344 we used both the trained clustering and the ‘untrained’ clustering of individuals approach, both 345 followed by an estimation of admixture proportions. Both approaches used a maximum number 346 of contributing populations of two, and a minimum contribution of each population in the mixture 347 of five individuals. 348

Effect of sample sizes and null alleles 349

In 1996 and 1997, about ten times more hatchery fish (clipped adipose fin) than wild fish 350 were sampled. Sample sizes can significantly affect assignment success with all approaches 351 described above (Cornuet et al., 1999), though relatively little is known about the effect of 352 differing sample sizes among populations contributing to a mixture. To equalize sample sizes, we 353 split clipped hatchery fish in 1996 and 1997 into those that were released above the weir (cr: 354 clipped released) and those that were killed in the hatchery (ck: clipped killed). We compared 355 these results with assignments obtained in a baseline where these samples were combined 356 (cr&ck) and a baseline, where all samples were split into equal samples of 25-33 individuals 357 (split). Parental identification was used to verify assignments in all cases. 358

One of the loci (Omy77) had a null allele that was known both from heterozygote 359 deficiencies and from parentage analyses. As all methods used here assume Hardy-Weinberg 360 equilibrium within contributing populations, we re-analyzed the data without Omy77. The 361 expectation was that assignment success would improve if the inclusion of null alleles poses a 362 problem. Again, parental identification was used to verify these assignments. 363

364

Results 365

Genetic diversity and Hardy-Weinberg and linkage equilibrium 366

Genetic diversity was high, with average heterozygosities between 86 and 90% (Table 1) 367 and between 10 and 30 alleles per locus. Significant deviations from both Hardy-Weinberg and 368 linkage equilibrium were found in both hatchery and wild samples, though the much smaller 369 sample sizes in wild fish complicated the interpretation of tests. One of the loci, Omy77, 370 consistently showed a deficiency of heterozygotes (Table S1), and was also known from 371 parentage tests to contain a null allele (Seamons et al., 2004). There was no clear trend in FIS 372 values between hatchery and wild fish, though there were more significant test results in the 373 larger samples of hatchery fish. Measures of multilocus linkage disequilibrium (rD), on the other 374 hand, appeared larger in the hatchery fish than in wild fish, and were more often significantly 375 different from zero. Temporally, there was a slight increase in linkage disequilibrium in the 376 unclipped fish, as would be expected if they consisted of a mixture of hatchery and wild fish. 377

The average genetic differentiation between clipped and unclipped fish in the baseline 378 (1996 – 1998) was small, but highly significant (average FST =0.015, Table 2), with most of the 379 differentiation among hatchery and wild fish rather than samples within each component 380 (AMOVA: FCT=0.022***, FSC=0.0015***). Differences between brood years were very small, 381

Page 11: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

11

but also significant for larger samples (FST= 0.002 – 0.005, Table 2). Nevertheless, as expected, 382 clipped and unclipped fish from 1996-1998 fell into two distinct groups (Figure 1). This 383 distinction was confirmed by the Shriver index, which averaged 0.415 between clipped and 384 unclipped fish within years, but only 0.239 between years. Unclipped fish from 2000 and 2001 385 also contained offspring of hatchery fish and were therefore intermediate (Figure 1). 386

Parental identification 387

The probability of a random match between an offspring and a single parent was 1.44*10-388 4. (CERVUS, exclusion probability over all loci 0.999856, range at individual loci: 0.501-0.791). 389 The upper limit of the probability of a random match (most frequent genotype at each locus) was 390 0.0164 (standard deviation: 0.0013). Therefore, our data, together with the stringent parental 391 identification criteria (see Methods), resulted in highly conservative parental assignment. 392

Parental identification was very successful for clipped hatchery fish of 2000 & 2001 393 (Table 3), and at least one parent could be identified for over 90% of the 2001 fish. However, 394 fewer parents of unclipped wild fish could be identified. Furthermore, most parents identified for 395 unclipped fish were clipped parents spawned in the hatchery (41 individuals in 2000, and 9 in 396 2001) suggesting that these fish escaped from the hatchery before their adipose fin could be 397 clipped or the clipping was not done properly. These fish were excluded from the unclipped 398 category in subsequent analysis because they were evidently the offspring of hatchery fish. Most 399 of the identifications of unclipped parents to unclipped offspring involved a single parent only, 400 which, together with the generally low parental identification success in unclipped fish, suggested 401 that many of the wild fish were not sampled. However, the parentage identification success was 402 sufficient to provide an efficient means of verification of population assignments. 403

Comparison of population assignment methods 404

Comparison with adipose condition 405

The highest self-assignment success for 1996-1998 respective to adipose fin condition 406 was achieved by STRUCTURE using baseline information and by GMA and BAYES (Table 4). These 407 estimates are likely to be inflated, however, as STRUCTURE only slightly modifies prior 408 assignment in the defined baseline populations, and because GMA and BAYES do not allow a 409 leave-one-out procedure. Of the remaining methods, Bayesian and frequency assignment using 410 GENECLASS2 performed best, as well as the BAPS approach without baseline information. This 411 latter result was surprising, as STRUCTURE without baseline information performed poorly for 412 hatchery fish. WHICHRUN and the distance method in GENECLASS2 were inferior to Bayesian 413 approaches, as expected. 414

Despite the problems with the validity of self-assignment in some approaches, self-415 assignment in 1996-1998 was a good predictor of 1999-2001 assignment success for the hatchery 416 (r2

across methods=0.84***) but not for the wild fish (r2 across methods =-0.01ns). Assignment success of 417

hatchery fish to the hatchery were generally above 90% for all Bayesian methods, BAPS and the 418 distance approach (GENECLASS2), while frequency methods (WHICHRUN, GENECLASS2) and 419 STRUCTURE, both with and without baseline, performed worse. Assignment success in wild fish 420 was difficult to evaluate because of the unknown number of hatchery fish in the sample. 421

Page 12: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

12

Using confident assignment, that is, only including individuals where the probability (or 422 proportion of the genome) of belonging to either hatchery or wild was more than 95%, usually, 423 but not always improved the assignment. The proportion of the sample that could not be assigned 424 using this very stringent method varied largely between different methods, and ranged from 425 around 10% for Bayesian methods to a third or even the entire sample with clustering methods. 426 Interestingly, however, confident assignment of unclipped fish was over 90% for almost all 427 methods in 2001, while it barely exceeded 60% in 2000. 428

Comparison with parentage data 429

All methods requiring baseline data assigned individuals to the correct group (that is, fish 430 with clipped parents to hatchery and fish with unclipped parents to the wild population) with 431 more than 90% accuracy (Table 5). Again, there was relatively little difference between the 432 methods, though the Bayesian methods had the highest standard assignment success. Self-433 assignment to the baseline was a good predictor for assignment success (compared to parental 434 identification) in hatchery (r2

across methods=0.87***), but not in wild fish (r2across methods=0.00ns). 435

Most methods achieved almost 100% assignment success for confident assignment, though the 436 proportion of the sample excluded varied between 5% and 36%. 437

The two clustering methods, STRUCTURE and BAPS, showed variable results: while BAPS 438 had high assignment success in the entire data set, whether baseline data were used or not, 439 STRUCTURE’s assignment success was high with a baseline, but very poor for hatchery fish 440 without baseline. Furthermore, a confident assignment was not possible with STRUCTURE as the 441 entire sample was excluded. Nevertheless, with a reduced sample set (unclipped fish of 1999-442 2001 without a baseline), the assignment success of STRUCTURE was as high as that of methods 443 requiring a baseline. BAPS, on the other hand, assigned almost all fish to just one cluster under 444 these circumstances. 445

Effect of sample size bias 446

The effect of changes in sample sizes was remarkably small (Table 6). Assignment 447 success after combining cr and ck samples usually changed by less than 1%, with the notable 448 exception of WHICHRUN, where assignment success of wild fish improved by almost 8%, while 449 assignment success of hatchery fish declined by almost 2%. The percentage of samples excluded 450 for confident assignment followed a similar trend. Generally, increasing the hatchery sample size 451 by combining samples resulted in a higher assignment success for the smaller wild sample and 452 lower assignment success for the larger hatchery sample, with the exception of BAYES where both 453 assignments deteriorated. Splitting all samples into samples of similar size had very little effect 454 on assignment of wild fish, but led to a decrease in assignment success to hatchery fish, with 455 reduced standard and confident assignment, and an increased proportion of the sample that could 456 not confidently be assigned. 457

Effect of null alleles 458

Removing the locus with the null allele (Omy77) resulted in a slight deterioration of 459 assignment success across all methods, which appeared to be more pronounced for hatchery fish 460 and for confident assignment measures (Table 7). Only GMA showed a 5% improvement of 461

Page 13: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

13

standard assignment success of wild fish. For clustering methods, the decrease in assignment 462 success of hatchery fish was more pronounced when the baseline was not used. Without baseline, 463 removal of the null allele locus resulted in a slight reduction of fish that could not be assigned 464 confidently, while with baseline, that proportion increased. Generally, however, the null allele 465 appeared to have very little effect on all methods. 466

Discussion 467

Our results demonstrated clearly the feasibility of assignment tests in cases where 468 contributing populations show (i) little genetic differentiation, (ii) significant deviations from 469 Hardy-Weinberg and linkage equilibrium, (iii) significant differentiation among brood years 470 suggesting considerable genetic drift and (iv) large inequality in sample sizes. Population 471 assignments could be verified against adipose fin condition (clipped vs. unclipped) and parentage 472 data and showed a remarkable consistency among methods, despite violation of some of the 473 assumptions. However, we found that self-assignment is not always a good predictor of the power 474 of a method to assign individuals in a mixed sample. 475

Comparison of methods 476

The level of genetic differentiation between wild and hatchery derived steelhead (FST = 477 0.02) is at the lower end of the range of previously reported successful assignment tests. Both 478 empirical (Berry et al., 2004; Mank & Avise, 2004) and simulated (Cornuet et al., 1999; Paetkau 479 et al., 2004) studies report a rapid drop of assignment success once FST falls below 0.05. In an 480 application of microsatellite data of three North Atlantic eel populations (FST = 0.013) to the 481 Bayesian clustering technique (STRUCTURE), Mank & Avise (2004) achieved about 60-80% 482 assignment success by using baseline information. In contrast, with only a slightly larger FST, our 483 assignment success almost generally exceeded 90%. However, an exact prediction of the power 484 of assignment tests from FST values is difficult (Paetkau et al., 2004; Manel et al., 2005), and the 485 index by Shriver et al. (1997) may be a better indicator (Table 2). These indices are fairly high 486 (average between unclipped and clipped fish 0.435, Table 2) and comparable to other assignment 487 studies with similar FST values (Olsen et al., 2000a). 488

One of the most surprising results of the study was the consistency of assignment success 489 among the different methods, which spanned a wide array of approaches, including Bayesian 490 based methods, frequency and distance methods and clustering approaches (Table 2). In contrast, 491 many comparisons in the literature, both on simulated and empirical data, describe larger 492 differences between approaches, especially at low FST values as described here. For example, 493 Cornuet et al. (1999) described twice the assignment power for Bayesian methods compared to 494 distance methods at an FST of 0.05 and 10 loci. More recently, Koljonen et al. (2005) compared 495 the Bayesian method in GENECLASS2 and the new Bayesian algorithm applied in BAYES, and 496 found 99% correct assignment with the latter method, but only 75% with the former. In a 497 comparison of the Bayesian method in GENECLASS2, and the clustering method of STRUCTURE, 498 (Manel et al., 2002) found consistently higher assignment with the latter method, sometimes with 499 a margin of 2 – 43%. The most likely explanation of this discrepancy may be a certain level of 500 publication bias, as well as the high general assignment success here that left little room for 501 variation among methods. Furthermore, our study only had two contributing populations 502 (hatchery and wild) whereas most assignment tests are carried out on mixtures of many 503

Page 14: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

14

populations. Nevertheless, of the methods that require baselines, the Bayesian methods appear to 504 perform best. Given the problems with self-assignment tests in GMA and the exclusion of mixture 505 individuals with unknown alleles in BAYES, GENECLASS2 may be the program of choice for our 506 own dataset. GENECLASS2 also allows the exclusion of individuals from populations (Piry et al., 507 2004), a method that was not used here but that may be useful for some applications. 508

Two methods that showed considerable variation in assignment success were the two 509 clustering methods, STRUCTURE and BAPS. Both programs have attracted considerable interest, 510 because they theoretically allow assignment without knowledge of baseline populations. Indeed, 511 both programs fulfilled this promise, albeit only under a single very specific subset of samples. 512 STRUCTURE performed very well with the smallest dataset that included only unclipped fish from 513 1999-2001, but failed to identify hatchery fish in all other sample sets without a baseline (Tables 514 4 & 5). In contrast, BAPS achieved an assignment success similar to methods with baselines when 515 all data were included, but grouped all individuals except a few into a single cluster with smaller 516 datasets (Tables 4 & 5). Interestingly, this tendency of BAPS to identify few large clusters was 517 also observed in a modeling study (Latch et al., in press). The reason for the discrepancy between 518 the results of the two programs is unclear, but it shows that both methods are sensitive to the size 519 of the dataset that is analyzed. Both programs performed very well when a baseline was used, 520 though this approach does not exploit the prime advantage of clustering methods. 521

Self assignment of baseline samples using the leave-one-out method is often used as a 522 predictor of the power of the dataset to assign unknown individuals (Guinand et al., 2004). In the 523 present study, assignment success in the mixed sample (1999-2001) was indeed strongly 524 correlated across methods with self-assignment success for the hatchery fish, but not for wild 525 fish. This difference is understandable when assignment success was checked against adipose fin 526 condition, as many unclipped fish in 1999-2001 may have been of hatchery ancestry. However, 527 the same pattern was apparent when assignment tests were verified against parentage data. Two 528 reasons may account for this difference: first, differentiation between years was larger in wild 529 than in hatchery fish, leading to a lower predictability of assignment success in years 1999-2000, 530 and second, the smaller sample size of wild fish may prevent an accurate estimation of allele 531 frequencies in baseline populations. 532

‘Confident’ assignment (>95% probability) revealed greater differences between methods; 533 these differences were seen primarily in the percentage of individuals that could not be assigned. 534 Generally, assignment success increased to over 90% when only individuals with more than 95% 535 probability of belonging to a specific population were considered. As a notable exception, 536 assignment success decreased in the 2000 unclipped fish, probably because the sample contained 537 many undetected fish of hatchery origin. The ‘cost’ of confident assignment, in terms of the 538 proportion of ambiguous individuals that could not be assigned, varied widely: two of the 539 Bayesian estimators (GMA, GENECLASS2) only removed 5-20% fish from the assignment, while 540 other methods failed to assign almost half the sample, and STRUCTURE without baseline did not 541 assign any of the individuals with confidence. Interestingly, the number of ambiguous individuals 542 was almost always higher in the wild fish, both for self-assignment to the baseline and for the 543 assignment of unknowns. Again, this may be an effect of smaller sample sizes, but may also 544 indicate the presence of hybrids in the wild steelhead population. 545

Page 15: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

15

The importance of ‘confident’ assignment depends very much on the application – for 546 example, conservation managers may want to use only individuals for breeding programs that 547 they are certain originate from a specific population, while other applications, such as the 548 identification of hatchery escapees or stock assignments in mixed fisheries, may require 549 assignment of as many individuals as possible, even if there is less certainty. However, some 550 indication of the confidence of the assignment may be useful for most applications. An alternative 551 approach to such confidence estimation is the permutation process of Cornuet et al. (1999), which 552 allows exclusion of individuals from specific populations. 553

Test of assumptions 554

Most assignment methods, with the exception of distance based methods, assume Hardy-555 Weinberg and linkage equilibrium. These assumptions were clearly not fulfilled in our data, as 556 one of the loci had a known null allele (Omy77, Appendix I). Nevertheless, although one would 557 expect the violation of the assumptions to be more significant for some methods than for others, 558 changes of assignment success after removal of that locus were minimal. In most cases, 559 assignment success declined, as would be expected when one of eight loci is removed. However, 560 the decline suggested that the noise introduced by the null allele had little effect. Interestingly, the 561 two largest changes were observed in BAPS, which assumes Hardy-Weinberg equilibrium, and 562 the distance method of GENECLASS, which does not. The largest improvement after removing 563 the null allele was observed with GMA, suggesting that GMA may be more sensitive to null 564 alleles than other methods. However, the 4.5% improvement was explained by a single additional 565 fish that was correctly assigned, and so these results have to be viewed with caution. As a general 566 conclusion, it appeared that a moderate violation of the assumption of absence of null alleles had 567 little effect, as suggested previously (Cornuet et al., 1999). Nevertheless, given the ubiquity of 568 null alleles in microsatellite studies, simulation studies further investigating their effect on 569 assignment tests are required. 570

An implicit assumption of assignment tests is that allele frequencies in contributing 571 populations have been sampled without error (Guinand et al., 2004). However, sampling error, 572 genetic drift in the population between the collection of samples from baseline and mixture, and 573 inclusion of migrants from different source populations in samples usually mean that baseline 574 population samples are not completely representative of the ‘pure’ contributing populations in the 575 mixture. In Forks Creek steelhead, there was small, but significant genetic differentiation among 576 brood years 1996, 1997 and 1998 in both hatchery and wild population (Table 2, Figure 1), 577 suggesting small effective population size and considerable genetic drift, and the potential of 578 genetic differentiation between parents in 1996 / 1997 and their offspring in 2000 / 2001. 579 Furthermore, the detection of unclipped hatchery fish in 2000 and 2001 from parentage data 580 suggests that such hatchery fish may have been included in the ‘wild’ baseline sample. Small 581 sample sizes in the wild fish may have further exacerbated errors in the estimation of baseline 582 allele frequencies. Sampling issues like these may have been responsible for (i) the higher 583 proportion of unassigned individuals in the wild population, when ‘confident’ assignment was 584 used; (ii) the lack of predictive power of self-assignment for the assignment of wild fish and (iii) 585 the generally better assignment success for hatchery fish. Our results therefore highlight the 586 importance of considering sampling issues, and we suggest carrying out sensitivity analyses to 587 ascertain the effects of sampling bias on individual assignments. 588

Page 16: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

16

Assuming an error free baseline ignores problems associated with rare alleles that, simply 589 because of sampling errors, may be present in individuals of the mixed sample but not in one or 590 more of the baseline samples. The simplest ways of dealing with this problem is the addition of 591 the allele to each baseline at a constant (GENECLASS2) or a sample-size dependent frequency 592 (WHICHRUN). This step results in assignments becoming dependent on sample sizes in the 593 baseline population (see Paetkau et al., 2004 for a review). In preliminary trials with WHICHRUN, 594 almost all fish were assigned to sample 98c (N=5, data not shown). However, after exclusion of 595 that sample, the next smallest sample (99u; N=16) posed no such problems, suggesting that a bias 596 may only be expected with extremely small sample sizes. Nevertheless, the method used by 597 GENECLASS2, which relied on adding the missing allele at a constant, predefined frequency (here, 598 0.01) to all samples may be preferable. Better still is the Bayesian approach (Rannala & 599 Mountain, 1997) implemented in GMA and GENECLASS2, allowing not only assignment of 600 individuals to populations, but also estimation of the frequency of alleles missing from baseline 601 samples. BAYES used the very rigorous, but somewhat unsatisfying approach of excluding all 602 individuals with alleles absent from all samples in the baseline. 603

This difference among methods in dealing with alleles missing in the baseline samples 604 may be one of the reasons for changes in assignment success after combining all hatchery fish 605 within a year (i.e. c and ck) and after splitting hatchery and wild fish into samples of 25-33 606 individuals. For example, WHICHRUN tended to assign more individuals to the wild population 607 when the sample size of hatchery baselines was higher (i.e. clipped and clipped killed combined). 608 However, WHICHRUN also assigned fewer individuals to the hatchery when the baseline was split 609 into samples of similar size. As expected, frequency and distance approaches appeared to be more 610 sensitive to sample size biases than Bayesian methods. However, generally, most changes caused 611 by lumping as well as splitting samples resulted in a small deterioration of assignment success, 612 suggesting that biases in samples may indeed be problematic, but that splitting into many small 613 samples reduces assignment success because allele frequencies in each sample are inaccurate, and 614 because more alleles need to be added to baseline samples at arbitrary frequencies. For empirical 615 studies, it may be worthwhile to investigate the sensitivity of results to sample size effects using a 616 similar procedure to the one described here. 617

618

References 619

620 Agapow PM, Burt A (2001) Indices of multilocus linkage disequilibrium. Molecular Ecology 621

Notes 1, 101-102. 622 Banks MA, Eichert W (2000) WHICHRUN (version 3.2): A computer program for population 623

assignment of individuals based on multilocus genotype data. Journal of Heredity 91, 87-624 89. 625

Banks MA, Eichert W, Olsen JB (2003) Which genetic loci have greater population assignment 626 power? Bioinformatics 19, 1436-1438. 627

Bartron ML, Swank DR, Rutherford ES, Scribner KT (2004) Methodological bias in estimates of 628 strain composition and straying of hatchery-produced steelhead in Lake Michigan 629 tributaries. North American Journal of Fisheries Management 24, 1288-1299. 630

Page 17: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

17

Belkhir K, Borsa P, Chikhi L, Raufaste N, Bonhomme F (2004) Genetix 4.05, logiciel sous 631 Windows TM pour la génétique des populations. Laboratoire Génome, Populations, 632 Interactions, CNRS UMR 5000, Université de Montpellier II, Montpellier (France). 633

Bernatchez L, Duchesne P (2000) Individual-based genotype analysis in studies of parentage and 634 population assignment: how many loci, how many alleles? Canadian Journal of Fisheries 635 and Aquatic Sciences 57, 1-12. 636

Berry O, Tocher MD, Sarre SD (2004) Can assignment tests measure dispersal? Molecular 637 Ecology 13, 551-561. 638

Brown AHD, Feldman MW, Nevo E (1980) Multilocus structure of natural populations of 639 Hordeum spontaneum. Genetics 96, 523-536. 640

Cavalli-Sforza LL, Edwards AW (1967) Phylogenetic analysis: Models and estimation 641 procedures. Evolution 21, 550-570. 642

Corander J, Marttinen P, Mäntyniemi S (in press) Bayesian identification of stock mixtures from 643 molecular marker data. Fishery Bulletin. 644

Corander J, Waldmann P, Marttinen P, Sillanpaa MJ (2004) BAPS 2: enhanced possibilities for 645 the analysis of genetic population structure. Bioinformatics 20, 2363-2369. 646

Cornuet JM, Luikart G (1996) Description and Power Analysis of Two Tests for Detecting 647 Recent Population Bottlenecks From Allele Frequency Data. Genetics 144, 2001-2014. 648

Cornuet JM, Piry S, Luikart G, Estoup A, Solignac M (1999) New methods employing multilocus 649 genotypes to select or exclude populations as origins of individuals. Genetics 153, 1989-650 2000. 651

Excoffier L, Laval G, Schneider S (2005) Arlequin (v 3.0): an integrated software package for 652 population genetics data analysis. Evolutionary Bioinformatics Online 2005, 47-50. 653

Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. 654 Statistical Science 7, 457-511. 655

Goudet J (1995) FSTAT (Version 1.2): A computer program to calculate F-statistics. Journal of 656 Heredity 86, 485-486. 657

Guinand B, Scribner KT, Topchy A, Page KS, Punch W, Burnham-Curtis MK (2004) Sampling 658 issues affecting accuracy of likelihood-based classification using genetical data. 659 Environmental Biology of Fishes 69, 245-259. 660

Hansen MM, Kenchington E, Nielsen EE (2001a) Assigning individual fish to populations using 661 microsatellite DNA markers. Fish Fisheries 2, 93-112. 662

Hansen MM, Nielsen EE, Bekkevold D, Mensberg KLD (2001b) Admixture analysis and 663 stocking impact assessment in brown trout (Salmo trutta), estimated with incomplete 664 baseline data. Canadian Journal of Fisheries and Aquatic Sciences 58, 1853-1860. 665

Hansen MM, Ruzzante DE, Nielsen EE, Mensberg KLD (2001c) Brown trout (Salmo trutta) 666 stocking impact assessment using microsatellite DNA markers. Ecological Applications 667 11, 148-160. 668

Hindar K, Ryman N, Utter F (1991) Genetic-Effects Of Cultured Fish On Natural Fish 669 Populations. Canadian Journal of Fisheries and Aquatic Sciences 48, 945-957. 670

Koljonen ML, Pella JJ, Masuda M (2005) Classical individual assignments versus mixture 671 modeling to estimate stock proportions in Atlantic salmon (Salmo salar) catches from 672 DNA microsatellite data. Canadian Journal of Fisheries and Aquatic Sciences 62, 2143-673 2158. 674

Page 18: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

18

Latch EK, Dharmarajan G, Glaubitz JC, Rhodes OE (in press) relative performance of Bayesian 675 clustering software for inferring population substructure and inidividual assignment at low 676 levels of population differentiation. Conservation Genetics. 677

Mackey G, McLean JE, Quinn TP (2001) Comparisons of run timing, spatial distribution, and 678 length of wild and newly established hatchery populations of steelhead in Forks Creek, 679 Washington. North American Journal Of Fisheries Management 21, 717-724. 680

Manel S, Berthier P, Luikart G (2002) Detecting wildlife poaching: Identifying the origin of 681 individuals with Bayesian assignment tests and multilocus genotypes. Conservation 682 Biology 16, 650-659. 683

Manel S, Gaggiotti OE, Waples RS (2005) Assignment methods: matching biological questions 684 techniques with appropriate. Trends in Ecology & Evolution 20, 136-142. 685

Mank JE, Avise JC (2004) Individual organisms as units of analysis: Bayesian-clustering 686 alternatives in population genetics. Genetical Research 84, 135-143. 687

Marshall TC, Slate J, Kruuk LEB, Pemberton JM (1998) Statistical confidence for likelihood-688 based paternity inference in natural populations. Molecular Ecology 7, 639-655. 689

McLean JE, Bentzen P, Quinn TP (2005) Nonrandom, size- and timing-biased breeding in a 690 hatchery population of steelhead trout. Conservation Biology 19, 446-454. 691

Morris DB, Richard KR, Wright JM (1996) Microsatellites from rainbow trout (Oncorhynchus 692 mykiss) and their use for genetic study of salmonids. Canadian Journal of Fisheries and 693 Aquatic Sciences 53, 120-126. 694

O'Reilly PT, Hamilton LC, McConnell SK, Wright JM (1996) Rapid analysis of genetic variation 695 in Atlantic salmon (Salmo salar) by PCR multiplexing of dinucleotide and tetranucleotide 696 microsatellites. Canadian Journal of Fisheries and Aquatic Sciences 53, 2292-2298. 697

Olsen JB, Bentzen P, Banks MA, Shaklee JB, Young S (2000a) Microsatellites reveal population 698 identity of individual pink salmon to allow supportive breeding of a population at risk of 699 extinction. Transactions of the American Fisheries Society 129, 232-242. 700

Olsen JB, Wilson SL, Kretschmer EJ, Jones KC, Seeb JE (2000b) Characterization of 14 701 tetranucleotide microsatellite loci derived from sockeye salmon. Molecular Ecology 9, 702 2185-2187. 703

Paetkau D, Calvert W, Stirling I, Strobeck C (1995) Microsatellite Analysis of Population-704 Structure in Canadian Polar Bears. Molecular Ecology 4, 347-354. 705

Paetkau D, Slade R, Burden M, Estoup A (2004) Genetic assignment methods for the direct, real-706 time estimation of migration rate: a simulation-based exploration of accuracy and power. 707 Molecular Ecology 13, 55-65. 708

Pella J, Masuda M (2001) Bayesian methods for analysis of stock mixtures from genetic 709 characters. Fishery Bulletin 99, 151-167. 710

Pella J, Masuda M, Nelson S (1996) Search algorithms for computing stock composition of a 711 mixture from traits of individuals by maximum likelihood. US Department of Commerce, 712 NOAA/NMFS. 713

Piry S, Alapetite A, Cornuet JM, Paetkau D, Baudouin L, Estoup A (2004) GENECLASS2: A 714 software for genetic assignment and first-generation migrant detection. Journal of 715 Heredity 95, 536-539. 716

Pritchard JK, Stephens M, Donnelly P (2000) Inference of Population Structure Using Multilocus 717 Genotype Data. Genetics 155, 945-959. 718

Raftery AE, Lewis SM (1996) Implementing MCMC. In: Marhov Chain Monte Carlo In Practice 719 (eds. Gilks WR, Richardson S, Spiegelhalter DJ), pp. 115-130. Chapman & Hall, London. 720

Page 19: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

19

Randi E, Pierpaoli M, Beaumont M, Ragni B, Sforzi A (2001) Genetic identification of wild and 721 domestic cats (Felis silvestris) and their hybrids using Bayesian clustering methods. 722 Molecular Biology and Evolution 18, 1679-1693. 723

Rannala B, Mountain JL (1997) Detecting immigration by using multilocus genotypes. 724 Proceedings of the National Academy of Sciences of the United States of America 94, 725 9197-9201. 726

Raymond M, Rousset F (1995) Genepop (version 1.2) - population genetics software for exact 727 tests and ecumenicism. Journal of Heredity 86, 248-249. 728

Seamons TR, Bentzen P, Quinn TP (2004) The mating system of steelhead, Oncorhynchus 729 mykiss, inferred by molecular analysis of parents and progeny. Environmental Biology of 730 Fishes 69, 333-344. 731

Shriver MD, Smith MW, Jin L, Marcini A, Akey JM, Deka R, Ferrell RE (1997) Ethnic-732 affiliation estimation by use of population-specific DNA markers. American Journal of 733 Human Genetics 60, 957-964. 734

Spies IB, Brasier DJ, O'Reilly PT, Seamons TR, Bentzen P (2005) Development and 735 characterization of novel tetra-, tri-, and dinucleotide microsatellite markers in rainbow 736 trout (Oncorhynchus mykiss). Molecular Ecology Notes 5, 278-281. 737

Utter F, Epifanio J (2002) Marine aquaculture: Genetic potentialities and pitfalls. Reviews In Fish 738 Biology And Fisheries 12, 59-77. 739

Waples RS (1991) Genetic interactions between hatchery and wild salmonids: lessons from the 740 Pacific Northwest. Canadian Journal Of Fisheries And Aquatic Sciences 48, 124-133. 741

Waples RS, Gaggiotti OE (2005) What is a population? Molecular Ecology in press. 742 Wasser SK, Shedlock AM, Comstock K, Ostrander EA, Mutayoba B, Stephens M (2004) 743

Assigning African elephant DNA to geographic region of origin: Applications to the ivory 744 trade. Proceedings of the National Academy of Sciences of the United States of America 745 101, 14847-14852. 746

Wilson GA, Rannala B (2003) Bayesian Inference of Recent Migration Rates Using Multilocus 747 Genotypes. Genetics 163, 1177-1191. 748

749 750

Page 20: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

20

Acknowledgements 751

We are grateful to Lyndsay Newton for help in the laboratory. We thank the staff of the Forks 752 Creek Hatchery (especially George Britter, Rob Allen, Dave Shores and Kevin Flowers), former 753 graduate students Gregory Mackey and Jennifer McLean, and others (especially Brodie Smith, 754 Michael Hendry, Jeramie Peterson and Chris Boatright) for assistance with the sampling. Funding 755 by the Bonneville Power Administration, the Weyerhaeuser Foundation, The Washington State 756 Hatchery Scientific Reform Group, Long Live the Kings, and the National Science Foundation is 757 gratefully acknowledged. 758

Author Information Box 759

This paper is part of a long-term project investigating survival and reproductive success of 760 hatchery steelhead released into Forks Creek, Washington. The project is a collaboration among 761 Thomas Quinn (salmon ecology), Kerry Naish (evolutionary genomics) and Lorenz Hauser 762 (population genetics). Todd Seamons is the postdoctoral research associate on the project, after 763 completion of his PhD thesis on reproductive success in wild steelhead in Snow Creek, 764 Washington. Michael Dauer is a graduate student working with Kerry Naish. 765

Page 21: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

21

Figure Legend 766 767 Figure 1: Multidimensional scaling plot of Cavalli-Sforza & Edwards (1967) distances among 768 unclipped (▲), clipped killed (●) and clipped released fish (□). R2 was 0.97 and stress 0.122. The 769 last group was the parents of hatchery fish spawned in the wild. Unclipped fish of 2000 and 2001 770 represent mixtures of wild and hatchery fish born in the wild. 771 772

Page 22: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

Table 1. Sample sizes of the three groups (clipped released, clipped & killed, unclipped) in each year. Average heterozygosity (He), FIS values and a measure of linkage disequilibrium (rD) are also given. Brood Year Clipped

released Clipped killed

Unclipped Total

1996 N He FIS rD

164 0.89 0.015

0.009**

46 0.89

0.043** 0.012

22 0.88 0.045 0.004

232

1997 N He FIS rD

185 0.88

0.038*** 0.017**

124 0.88 0.008

0.032**

23 0.90 0.022 0.014

332

1998 N He FIS rD

5 0.90

-0.003 0.078

44 0.89 0.031

0.015**

66 0.89 0.020 0.013*

115

1999 N He FIS rD

127 0.89 0.006

0.042**

16 0.86 0.004 -0.069

143

2000 N He FIS rD

2 - - -

125 0.88 0.015

0.044**

86 0.90

0.028** 0.032**

213

2001 N He FIS rD

125 0.88

0.023* 0.079**

85 0.89

-0.003 0.018**

203

Total N 356 584 298 1247

Page 23: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

Table 2. Pairwise FST values (below diagonal) and allele frequency differentials (Shriver 1997, above diagonal) of Forks Creek steelhead. Significant FST values are in bold (P<0.05), significant results removed by sequential Bonferroni procedure are in bold italics. Allele frequency differentials are only shown between samples used for the baseline of assignments. 96ck 96cr 96u 97ck 97cr 97u 98ck 98u 99ck 99u 00ck 00u 01ck96ck 0.166 0.420 0.237 0.208 0.432 0.263 0.398 96cr -0.002 0.402 0.190 0.173 0.406 0.229 0.358 96u 0.022 0.022 0.441 0.436 0.328 0.451 0.295 97ck 0.004 0.005 0.027 0.128 0.422 0.224 0.406 97cr 0.003 0.004 0.027 0.000 0.434 0.209 0.395 97u 0.021 0.019 0.002 0.025 0.026 0.426 0.278 98ck 0.003 0.003 0.023 0.002 0.001 0.018 0.397 98u 0.023 0.02 0.002 0.027 0.027 0.001 0.019 99ck -0.004 0.003 0.022 0.008 0.007 0.021 0.004 0.021 99u 0.031 0.027 0.007 0.036 0.035 0.011 0.03 0.008 0.028 00ck 0.006 0.008 0.029 -0.001 0.002 0.028 0.004 0.03 0.009 0.036 00u 0.004 0.004 0.010 0.003 0.005 0.009 0.003 0.011 0.006 0.017 0.006 01ck 0.009 0.01 0.032 0.007 0.007 0.025 0.003 0.027 0.013 0.04 0.01 0.01 01u 0.013 0.013 0.001 0.019 0.02 0.003 0.013 0.002 0.013 0.011 0.021 0.007 0.022

Page 24: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

Table 3. Parental identification success for clipped and unclipped offspring returning in BY 1999-2001. The percentage clipped or unclipped offspring assigned to clipped and unclipped parents, and the percent total assignment success is shown. Hatchery escapees, i.e. unclipped fish with clipped parents, are shown in italics. Offspring Clipped Clipped Clipped Unclipped Unclipped UnclippedParent Clipped Unclipped Total Clipped Unclipped Total Both Parents 1999 22.1 0 22.1 0 0 0 2000 73.2 0 73.2 45.4 0 45.4 2001 80.0 0 80.0 8.2 1.1 9.3 One parent 1999 0.7 0 0.7 0 0 0 2000 4.8 0 4.8 2.3 4.7 7.0 2001 12.0 0 12.0 2.4 20.1 22.5

Page 25: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

Table 4. Percent correct assignment of individual fish to hatchery (clipped; clipped released and clipped killed fish are combined here) and wild (unclipped) population. Self Assignment was carried out for BY 1996 – 1999 using the leave-one-out procedure (except GMA). Fish from 1999-2001 were assigned to hatchery or wild using baseline data of BY 1996-1998. Note that clipped fish from 1999-2001 should be 100% hatchery fish, whereas unclipped fish are likely to contain an unknown proportion of hatchery fish. Hatchery fish known from parentage assignment (i.e. unclipped fish with clipped parents, 41 individuals in 2000, and 9 individuals in 2001) are considered in the analysis. Est.: Standard assignment assigning every fish to one of the two populations; Conf.: Percentage correct assignment of all assigned fish after exclusion of ambiguous individuals; Not ass.: Percentage of not assigned, ambiguous fish in the total sample.

1996 1997 1998 Baseline Average

1999

2000

2001

Mixture Average

clip. uncl. clip. uncl. clip. uncl. clip. uncl. clip. uncl. clip. uncl. clip. uncl. clip. uncl. N 210 22 209 23 49 66 127 16 125

(+41) 86

(-41) 125 (+9)

85 (-9)

Bayesian GMA

Est. 99 100 99 96 100 95 99 97 94 75 98 58 96 72 96 68 Conf. 99 100 100 100 100 95 100 98 98 83 99 47 99 94 99 75 Not ass. 8 14 3 4 5 9 5 9 9 25 4 20 5 38 6 28

GeneClass2 Est. 95 95 97 87 93 91 95 91 93 75 97 58 96 79 95 71 Conf. 99 100 100 94 95 92 98 95 97 85 99 55 99 96 98 79 Not ass. 11 36 6 22 9 20 9 26 8 19 4 11 8 34 7 21

Bayes Est. 96 100 98 96 91 94 95 97 90 77 98 58 95 89 94 75 Conf. 99 100 99 100 100 96 99 99 92 100 99 55 97 96 96 84 Not ass. 19 18 9 17 14 20 14 18 16 46 10 28 16 35 14 36 Frequency

WhichRun Est. 85 100 92 87 84 95 87 94 82 88 91 60 86 87 86 78 Conf. 95 100 98 100 92 96 95 99 89 100 95 66 97 96 94 87 Not ass. 44 36 29 26 41 17 38 26 22 38 23 36 31 38 25 37

GeneClass2 Est. 91 95 97 83 89 92 92 90 89 81 93 60 94 79 92 73 Conf. 99 100 100 93 97 93 99 95 94 91 99 52 99 95 97 79 Not ass. 23 50 15 35 20 35 19 40 17 31 13 31 17 45 16 36 Distance

GeneClass2 Est. 93 91 96 83 84 92 91 89 91 81 95 58 94 84 93 74 Conf. 96 100 100 100 97 95 98 98 96 85 98 56 99 95 98 79 Not ass. 27 55 17 39 27 38 24 44 21 19 15 40 21 50 19 36 Clustering

Structure With baseline

Est. 100 100 100 100 100 100 100 100 87 81 92 60 89 91 89 77 Conf. 100 100 100 100 100 100 100 100 98 100 99 59 100 91 99 83

Page 26: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

Not ass. 32 14 17 26 30 27 26 22 68 69 54 62 58 71 60 67 Without baseline

Est. 64 100 82 100 68 97 71 99 69 100 81 71 75 95 75 89 Conf. - - 100 100 - 100 - - - - - - - 100 - - Not ass. 100 100 99 91 100 94 100 95 100 100 100 100 100 97 100 99

BAPS With baseline

Est. 93 100 97 100 91 95 94 98 90 81 97 58 96 82 94 74 Conf. 98 100 100 100 100 98 99 99 92 92 97 55 97 95 95 81 Not ass. 39 27 29 30 32 32 33 30 27 19 20 31 25 43 24 31

Without baseline Est. 88 100 95 87 86 92 90 93 87 88 93 64 90 92 90 81 Conf. 95 100 98 100 85 98 93 99 90 93 97 60 95 96 94 83 Not ass. 41 14 22 22 25 29 29 21 35 13 21 33 36 30 31 25

Page 27: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

Table 5: Percentage of fish assigned to hatchery and wild (rows) in comparison with parentage data (columns; clip.: clipped hatchery parents; unclip.: unclipped wild parents) in 1999-2001. For the two clustering approaches the results from all samples with and without baseline information, as well as from all samples and unclipped fish in 1999-2001 are also shown. Est.: Standard assignment assigning every fish to one of the two populations; Conf.: Percentage correct assignment of all assigned fish after exclusion of ambiguous individuals; Not ass.: Percentage of not assigned, ambiguous fish in the total sample.

Parent clip. unclip. Parent clip. unclip. N 295 22 N 295 22 Bayesian

GeneClass2 GMA Est. 98 95 Est. 99 91 Conf. 100 100 Conf. 100 95 Not ass. 5 14 Not ass. 4 9 Bayesian Distance

Bayes GeneClass2 Est. 98 95 Est. 96 91 Conf. 99 100 Conf. 99 100 Not ass. 13 19 Not ass. 16 36 Frequency

GeneClass2 WhichRun Est. 95 95 Est. 90 95 Conf. 100 100 Conf. 98 100 Not ass. 14 23 Not ass. 26 18 Clustering Structure All fish

With baseline information Without baseline information Est. 93 95 Est. 78 95 Conf. 100 100 Conf. - 100 Not ass. 58 55 Not ass. 100 95 Clustering BAPS All fish

With baseline information Without baseline information Est. 97 91 Est. 94 95 Conf. 99 94 Conf. 99 100 Not ass. 23 23 Not ass. 31 23

Clustering with mixed sample only All fish Unclipped fish only

N 295 22 N 50 22 Structure Est. 53 95 Est. 94 91 Conf. 53 100 Conf. 100 100 Not ass. 30 5 Not ass. 20 41 BAPS Est. 8 100 Est. 2 100 Conf. 7 100 Conf. 2 100 Not ass. 33 9 Not ass. 38 14

Page 28: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

Table 6: Effect of sample size bias on correct assignments as determined by parentage. The difference to the original percentage correct assignment using cr and ck fish separately (Table 5) is shown. Est.: Standard assignment assigning every fish to one of the two populations; Conf.: Percentage correct assignment of all assigned fish after exclusion of ambiguous individuals; Not ass.: Percentage of not assigned, ambiguous fish in the total sample.

cr & ck

combined Samples split into

N=25-33 Parent Clip. Unclip. Clip. Unclip. Bayesian

GMA Est. 0.7 0.3 0.0 0.0 Conf. 0.0 0.0 5.0 -0.3 Not ass. 0.0 -0.3 18.2 4.5

Bayes Est. -0.7 -4.8 1.0 0.0 Conf. 0.0 0.0 0.4 -5.6 Not ass. -2.0 0.0 -6.1 -4.8 GeneClass2 Est. -0.3 0.7 -4.5 0.0 Conf. 0.0 0.0 -6.3 0.0 Not ass. 0.7 -1.4 13.6 -4.5 Frequency WhichRun Est. -1.7 7.8 -9.1 0.0 Conf. -3.5 1.8 -7.7 0.0 Not ass. 10.8 -13.9 22.7 -4.5 GeneClass2 Est. -1.4 3.4 -9.1 0.0 Conf. -0.8 0.4 -7.1 0.0 Not ass. 4.1 -5.4 13.6 -4.5 Distance GeneClass2 Est. -0.3 2.4 -13.6 0.0 Conf. 0.0 -0.3 -6.3 0.0 Not ass. 2.0 -8.1 -9.1 0.0

Page 29: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

Table 7: Effect of exclusion of a locus with a null allele (Omy77) on assignment success compared to assignment with all loci (Table 5). The difference to the original percentage correct assignment with all loci is shown. Est.: Standard assignment assigning every fish to one of the two populations; Conf.: Percentage correct assignment of all assigned fish after exclusion of ambiguous individuals; Not ass.: Percentage of not assigned, ambiguous fish in the total sample.

Parent clip. unclip. Parent clip. unclip. N 295 22 N 295 22 Bayesian

GeneClass2 GMA Est. -0.7 0.0 Est. -0.3 4.5 Conf. 0.0 -4.8 Conf. 0.0 -0.6 Not ass. 1.0 -9.1 Not ass. 0.7 9.1 Bayesian Distance

Bayes GeneClass2 Est. -1.7 0.0 Est. 0.0 -4.5 Conf. 0.8 -6.7 Conf. 0.4 -6.3 Not ass. 2.7 9.5 Not ass. 0.0 -9.1 Frequency

GeneClass2 WhichRun Est. -0.7 0.0 Est. 1.4 0.0 Conf. 0.0 -5.3 Conf. -0.2 -5.3 Not ass. 2.7 -9.1 Not ass. 6.1 -4.5 Clustering Structure All fish

With baseline Without baseline Est. -0.3 0.0 Est. -2.7 0.0 Conf. 0.0 0.0 Conf. - 0.0 Not ass. -1.4 9.1 Not ass. 0.0 -9.1 Clustering BAPS All fish

With baseline Without baseline Est. -2.0 0.0 Est. -6.8 0.0 Conf. -0.5 5.6 Conf. -2.5 -5.0 Not ass. 4.1 4.5 Not ass. 0.3 -13.6

Page 30: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

Supplementary Material Table S1. Sample sizes per locus of the three groups (clipped, clipped & killed, unclipped) in each brood year (N). Allelic diversity adjusted to the minimum sample size of 7 (NA), observed (HO) and expected heterozygosity (He), FIS values and a multilocus measure of linkage disequilibrium (rD) are also given. FIS values significant at the 0.05 level are in bold, those significant after Bonferroni correction are in bold and underlined.

Ssa85

Omy1011 UW

Omy1001UW

Omy77

Omy1191UW

One108

Oki3A

Omy1212UW Average

96ck N 45 44 45 46 45 45 44 35 43.6 NA 6.4 8.2 8.0 6.5 10.0 7.9 7.2 10.1 8.0 HO 0.822 0.864 0.800 0.739 0.911 0.867 0.909 0.886 0.850 HE 0.804 0.896 0.9 0.853 0.945 0.901 0.859 0.947 0.888 FIS -0.023 0.036 0.111 0.134 0.036 0.038 -0.059 0.064 0.042 rD 0.01296cr N 162 162 162 161 161 158 151 137 156.8 NA 6.7 8.2 8.1 6.3 9.5 8.3 7.1 9.7 8.0 HO 0.784 0.932 0.920 0.745 0.957 0.867 0.861 0.912 0.872 HE 0.818 0.898 0.897 0.842 0.931 0.91 0.853 0.934 0.885 FIS 0.041 -0.038 -0.025 0.114 -0.027 0.047 -0.009 0.023 0.016 rD 0.00996u N 21 22 22 22 21 21 20 21 21.3 NA 8.9 6.6 7.8 5.8 9.0 7.6 7.2 10.0 7.9 HO 0.952 0.818 0.818 0.727 0.810 0.810 0.850 0.952 0.842 HE 0.911 0.853 0.893 0.782 0.92 0.889 0.861 0.944 0.882 FIS -0.046 0.041 0.084 0.071 0.12 0.09 0.012 -0.009 0.045 rD 0.00497ck N 121 120 121 120 123 121 122 122 121.3 NA 6.9 8.6 7.5 5.8 9.2 7.9 7.7 9.3 7.9 HO 0.851 0.925 0.860 0.692 0.943 0.884 0.910 0.943 0.876 HE 0.834 0.914 0.874 0.82 0.926 0.893 0.878 0.924 0.883 FIS -0.02 -0.012 0.016 0.157 -0.018 0.009 -0.036 -0.02 0.010 rD 0.03297cr N 184 184 184 179 184 183 182 182 182.8 NA 6.9 8.7 7.1 6.0 9.5 8.3 7.3 9.7 7.9 HO 0.859 0.897 0.815 0.615 0.924 0.918 0.830 0.929 0.848 HE 0.831 0.918 0.851 0.822 0.933 0.904 0.86 0.934 0.882 FIS -0.033 0.023 0.042 0.253 0.009 -0.015 0.035 0.006 0.040 rD 0.01797u N 23 23 23 23 23 23 23 23 23.0 NA 7.7 8.1 7.7 6.3 9.6 8.1 8.5 10.4 8.3

Page 31: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

HO 0.913 0.826 0.870 0.739 0.957 0.870 0.913 0.957 0.880 HE 0.881 0.908 0.889 0.824 0.936 0.903 0.904 0.953 0.900 FIS -0.036 0.09 0.022 0.103 -0.022 0.037 -0.01 -0.004 0.023 rD 0.01498ck N 44 44 43 43 43 44 44 43 43.5 NA 7.1 8.9 8.6 5.4 9.3 8.4 7.8 9.9 8.2 HO 0.886 0.909 0.907 0.721 0.930 0.864 0.773 0.930 0.865 HE 0.856 0.924 0.912 0.79 0.932 0.908 0.881 0.937 0.893 FIS -0.036 0.017 0.005 0.088 0.002 0.049 0.123 0.007 0.032 rD 0.01598u N 65 66 65 64 65 65 63 62 64.4 NA 8.1 8.2 8.1 6.3 9.5 8.6 7.6 10.1 8.3 HO 0.923 0.939 0.938 0.766 0.892 0.815 0.841 0.903 0.877 HE 0.894 0.898 0.901 0.808 0.934 0.913 0.876 0.939 0.895 FIS -0.033 -0.046 -0.041 0.053 0.045 0.107 0.04 0.038 0.020 rD 0.01399ck N 125 123 127 121 125 126 122 125 124.3 NA 6.7 7.7 8.1 6.7 9.6 8.2 7.1 9.6 8.0 HO 0.832 0.886 0.929 0.793 0.944 0.905 0.828 0.928 0.881 HE 0.836 0.873 0.9 0.853 0.936 0.903 0.854 0.932 0.886 FIS 0.005 -0.015 -0.032 0.070 -0.008 -0.002 0.031 0.004 0.007 rD 0.04299u N 15 15 15 15 15 15 7 15 14.0 NA 8.6 6.0 8.5 3.9 8.5 8.7 6.0 9.9 7.5 HO 0.867 0.667 0.800 0.867 1.000 0.867 0.857 0.933 0.857 HE 0.912 0.819 0.919 0.679 0.912 0.917 0.786 0.945 0.861 FIS 0.05 0.186 0.13 -0.277 -0.097 0.055 -0.091 0.013 -0.004 rD -0.06900ck N 129 128 128 126 128 127 115 125 125.8 NA 7.0 8.3 7.0 5.9 9.4 8.1 7.7 9.2 7.8 HO 0.822 0.922 0.875 0.643 0.938 0.898 0.878 0.952 0.866 HE 0.832 0.907 0.854 0.823 0.931 0.894 0.882 0.913 0.880 FIS 0.012 -0.017 -0.025 0.219 -0.007 -0.004 0.004 -0.043 0.017 rD 0.04400u N 85 85 84 83 85 85 82 83 84.0 NA 7.5 8.3 8.0 6.1 9.7 8.4 8.3 9.9 8.3 HO 0.882 0.859 0.893 0.723 0.941 0.847 0.854 0.964 0.870 HE 0.856 0.902 0.899 0.82 0.937 0.909 0.9 0.941 0.896 FIS -0.031 0.048 0.007 0.118 -0.005 0.068 0.052 -0.025 0.029 rD 0.03201ck N 125 125 125 123 125 124 122 125 124.3

Page 32: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

NA 6.3 8.8 7.7 6.0 8.7 7.8 7.5 9.2 7.7 HO 0.864 0.944 0.880 0.634 0.880 0.823 0.902 0.928 0.857 HE 0.824 0.921 0.875 0.815 0.915 0.881 0.872 0.914 0.877 FIS -0.048 -0.025 -0.006 0.222 0.038 0.066 -0.034 -0.015 0.025 rD 0.07901u N 85 83 83 84 83 84 74 81 82.1 NA 8.0 7.6 7.7 6.4 9.9 8.2 7.9 10.0 8.2 HO 0.941 0.819 0.867 0.774 0.964 0.869 0.932 0.951 0.890 HE 0.886 0.865 0.884 0.833 0.944 0.902 0.889 0.939 0.893 FIS -0.062 0.039 0.005 0.057 -0.034 0.036 -0.049 -0.012 -0.003 rD 0.018

Page 33: An empirical verification of population assignment methods ......29 Assignment tests are increasingly applied in ecology and conservation, though empirical 30 comparisons of methods

Dimension 1210-1-2-3

Dim

ensi

on 2

2

1

0

-1

-2

01u 01ck00u

00ck

99u

99ck

98u

98ck

97u

97c

97ck96u

96c 96ck