deciphering the routes of invasion of drosophila suzukii

18
HAL Id: hal-01582586 https://hal.archives-ouvertes.fr/hal-01582586 Submitted on 3 Jul 2018 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Deciphering the Routes of invasion of Drosophila suzukii by Means of ABC Random Forest. Antoine Fraimout, Vincent Debat, Simon Fellous, Ruth A Hufbauer, Julien Foucaud, Pierre Pudlo, Jean-Michel Marin, Donald K Price, Julien Cattel, Xiao Chen, et al. To cite this version: Antoine Fraimout, Vincent Debat, Simon Fellous, Ruth A Hufbauer, Julien Foucaud, et al.. Decipher- ing the Routes of invasion of Drosophila suzukii by Means of ABC Random Forest.. Molecular Biology and Evolution, Oxford University Press (OUP), 2017, 34 (4), pp.980 - 996. 10.1093/molbev/msx050. hal-01582586

Upload: others

Post on 15-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deciphering the Routes of invasion of Drosophila suzukii

HAL Id hal-01582586httpshalarchives-ouvertesfrhal-01582586

Submitted on 3 Jul 2018

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents whether they are pub-lished or not The documents may come fromteaching and research institutions in France orabroad or from public or private research centers

Lrsquoarchive ouverte pluridisciplinaire HAL estdestineacutee au deacutepocirct et agrave la diffusion de documentsscientifiques de niveau recherche publieacutes ou noneacutemanant des eacutetablissements drsquoenseignement et derecherche franccedilais ou eacutetrangers des laboratoirespublics ou priveacutes

Distributed under a Creative Commons Attribution| 40 International License

Deciphering the Routes of invasion of Drosophila suzukiiby Means of ABC Random Forest

Antoine Fraimout Vincent Debat Simon Fellous Ruth A Hufbauer JulienFoucaud Pierre Pudlo Jean-Michel Marin Donald K Price Julien Cattel

Xiao Chen et al

To cite this versionAntoine Fraimout Vincent Debat Simon Fellous Ruth A Hufbauer Julien Foucaud et al Decipher-ing the Routes of invasion of Drosophila suzukii by Means of ABC Random Forest Molecular Biologyand Evolution Oxford University Press (OUP) 2017 34 (4) pp980 - 996 101093molbevmsx050hal-01582586

Deciphering the Routes of invasion of Drosophila suzukii byMeans of ABC Random Forest

Antoine Fraimout1 Vincent Debat1 Simon Fellous2 Ruth A Hufbauer23 Julien Foucaud2 Pierre Pudlo4

Jean-Michel Marin5 Donald K Price6 Julien Cattel7 Xiao Chen8 Marindia Depra9

Pierre Francois Duyck10 Christelle Guedot11 Marc Kenis12 Masahito T Kimura13 Gregory Loeb14

Anne Loiseau2 Isabel Martinez-Sa~nudo15 Marta Pascual16 Maxi Polihronakis Richmond17

Peter Shearer18 Nadia Singh19 Koichiro Tamura20 Anne Xuereb2 Jinping Zhang21 and Arnaud Estoup2

1Institut de Systematique Evolution Biodiversite ISYEB - UMR 7205 ndash CNRS MNHN UPMC EPHE Museum national drsquoHistoirenaturelle Sorbonne Universites Paris France2INRA Centre de Biologie et de Gestion des Populations (UMR INRA IRD Cirad Montpellier SupAgro) Montferrier-Sur-Lez France3Colorado State University Fort Collins CO4Centre de Mathematiques et Informatique Aix-Marseille Universite Marseille France5Institut Montpellierain Alexander Grothendieck Universite de Montpellier Montpellier France6Tropical Conservation Biology amp Environmental Science University of Hawaii at Hilo HI7Laboratoire de Biometrie et Biologie Evolutive UMR CNRS 5558 Universite Claude Bernard Lyon 1 Villeurbanne France8College of Plant Protection Yunnan Agricultural University Kunming Yunnan Province Peoplersquos Republic of China9Programa de Pos Graduac~ao em Genetica e Biologia Molecular Programa de Pos Graduac~ao em Biologia Animal Universidade Federaldo Rio Grande do Sul Porto Alegre Brazil10CIRAD UMR Peuplement Vegetaux et Bioagresseurs en Milieu Tropical Paris France11Department of Entomology University of Wisconsin Madison WI12CABI Delemont Switzerland13Graduate School of Environmental Earth Science Hokkaido Daigaku University Sapporo Hokkaido Prefecture Japan14Department of Entomology Cornell University Ithaca NY15Dipartimento di Agronomia Animali Alimenti Risorse Naturali e Ambiente Universita degli Studi di Padova Padova Italy16Departament de Genetica Universitat de Barcelona Barcelona Spain17Division of Biological Sciences University of California San Diego18Mid-Columbia Agricultural Research and Extension Center Oregon State University Hood River OR19Department of Genetics North Carolina State University Raleigh NC20Department of Biological Sciences Tokyo Metropolitan University Tokyo Japan21MoA-CABI Joint Laboratory for Bio-safety Chinese Academy of Agricultural Sciences BeiXiaGuan Haidian Qu China

Corresponding author E-mail arnaudestoupinrafr

Associate editor Rasmus Nielsen

Abstract

Deciphering invasion routes from molecular data is crucial to understanding biological invasions includingidentifying bottlenecks in population size and admixture among distinct populations Here we unravel theinvasion routes of the invasive pest Drosophila suzukii using a multi-locus microsatellite dataset (25 loci on 23worldwide sampling locations) To do this we use approximate Bayesian computation (ABC) which has improvedthe reconstruction of invasion routes but can be computationally expensive We use our study to illustrate the useof a new more efficient ABC method ABC random forest (ABC-RF) and compare it to a standard ABC method(ABC-LDA) We find that Japan emerges as the most probable source of the earliest recorded invasion into HawaiiSoutheast China and Hawaii together are the most probable sources of populations in western North Americawhich then in turn served as sources for those in eastern North America European populations are geneticallymore homogeneous than North American populations and their most probable source is northeast China withevidence of limited gene flow from the eastern US as well All introduced populations passed through bottlenecksand analyses reveal five distinct admixture events These findings can inform hypotheses concerning how thisspecies evolved between different and independent source and invasive populations Methodological comparisonsindicate that ABC-RF and ABC-LDA show concordant results if ABC-LDA is based on a large number of simulated

Article

The Author 2017 Published by Oxford University Press on behalf of the Society for Molecular Biology and EvolutionThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40) which permits unrestricted reuse distribution and reproduction in any medium provided the original work isproperly cited Open Access980 Mol Biol Evol 34(4)980ndash996 doi101093molbevmsx050 Advance Access publication February 25 2017Downloaded from httpsacademicoupcommbearticle-abstract3449802953204

by Universite De La Reunion useron 03 July 2018

datasets but that ABC-RF out-performs ABC-LDA when using a comparable and more manageable number ofsimulated datasets especially when analyzing complex introduction scenarios

Key words Drosophila suzukii invasion routes random forest approximate Bayesian computation populationgenetics

IntroductionBiological invasions are a component of global change andtheir impact on the communities and ecosystems they invadeis substantial (Simberloff 2013) Invasive populations evolverapidly via both neutral and selective evolutionary processesas they undergo dramatic range expansion demographicreshuffling and experience new selection regimes (Kellerand Taylor 2010 Dlugosch et al 2015) While extensive liter-ature has laid the foundations of an evolutionary frameworkof biological invasions (Sakai et al 2001 Lee 2002 Facon et al2006 Dlugosch et al 2015 Estoup et al 2016) whether evo-lutionary processes are drivers of invasion success or out-comes of introduction processes remains unclear

Two of the evolutionary processes hypothesized to play arole in invasions are demographic bottlenecks and geneticadmixture (Ellstrand and Shierenbeck 2000 Dlugosch andParker 2008 Rius and Darling 2014) Demographic bottle-necks which correspond to transitory reductions in popula-tion size associated with invasions can reduce geneticvariation and thus may constrain invasion success(Edmonds et al 2004 Dlugosch and Parker 2008 Peischland Excoffier 2015) A diverse array of mechanisms allowsbottlenecked populations to overcome the deleterious con-sequences of low genetic variation and adapt to their novelenvironments [reviewed in Estoup et al (2016)] Genetic ad-mixture occurs when multiple introduction events derivefrom genetically differentiated native or invasive populationsAdmixure can increase genetic diversity in introduced popu-lations potentially enhancing invasion success by increasinggenetic variation on which selection can act (Kolbe et al 2004Lavergne and Molofsky 2007) or via producing entirely novelgenotypes that may facilitate colonization of novel habitats(Dlugosch and Parker 2008 Rius and Darling 2014) It remainsunclear however how often genetic admixture occurs in in-vasions and whether it acts as a true driver of invasion success(Uller and Leimu 2011 Rius and Darling 2014)

To evaluate hypotheses regarding bottlenecks and admix-ture as well as adaptation or other processes that may occurduring and after introductions we must first identify theoriginal native or invasive source(s) of the introductions Bycomparing introduced to likely source populations one caninfer whether populations diverged during invasion and thenstart to distinguish among evolutionary processes that driveobserved differences (Dlugosch and Parker 2008 Keller andTaylor 2010) Typically accurate historical data on introduc-tions pathways are limited (Estoup and Guillemaud 2010)Traditional population genetic approaches can describe thegenetic structure and relationships among sampled nativeand invasive populations (Boissin et al 2012) but determin-ing origins of introduced species from population genetic

data is challenging This challenge prompted the develop-ment of statistical methods such as approximate Bayesiancomputation (ABC Beaumont et al 2002) In ABC analysisdifferent introduction scenarios that describe possible intro-duction pathways are compared quantitatively taking amodel-based Bayesian approach Datasets matching differentscenarios are generated by simulation and by comparingthese simulated datasets to the observed dataset approxi-mate posterior probabilities of the scenarios are estimated(Bertorelle et al 2010 Beaumont 2010) A crucial step in de-termining the most probable introduction pathway is to for-mally describe a finite set of possible introduction scenarios asmodels (Estoup and Guillemaud 2010) These introductionpathways should be based on temporalhistorical informa-tion about invasions where feasible and they can be realisti-cally complex For example rather than a species movingdirectly from the native to invasive range it might first passthrough another region experiencing bottlenecks andor ad-mixture along the way ABC has been applied successfully to awide range of case studies from pest invasion (eg Harmoniaaxiridis Lombaert et al 2014) to anthropological research(eg Homo sapiens Verdu et al 2009) and is now widelyused by the population genetics community However anddespite the many advantages of ABC it relies on massivesimulations that can render it computationally costlyWhen multiple complex introduction scenarios are consid-ered the time and computational resources required tochoose among them can become prohibitive (Lombaertet al 2014) To overcome this constraint a new algorithmcalled ABC random forest (ABC-RF) has been developed(Pudlo et al 2016) Simulations show that ABC-RF discrimi-nates among scenarios more efficiently than traditional ABCmethods (Pudlo et al 2016) and thus it appears particularlysuitable for distinguishing among complex invasion pathways(see section below New approaches)

Here we evaluate the origins of invasive populations ofspotted-wing Drosophilla suzukii (Matsumura 1931) includ-ing where and whether they passed through bottlenecks inpopulations size or experienced admixture between geneti-cally distinct groups We use this invasion to illustrate the useof ABC-RF and as a case study to compare it to a morestandard ABC method Drosophila suzukii is a representativeof the melanogaster group and is historically distributed inSoutheast Asia covering a large portion of China JapanThailand and neighboring countries (Asplen et al 2015)Unlike most drosophilid species D suzukii females display alarge serrated ovipositor (Atallah et al 2014) allowing them tolay eggs in ripening fruits and they are thus a major threat tothe agricultural production of stone fruits and berries (Leeet al 2011) The presence of the species outside of its native

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

981Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

range was first recorded in the Hawaiian archipelago in theearly 1980rsquos (Kaneshiro 1983) but no further spread was ob-served until the late 2000rsquos when D suzukii was almost syn-chronously recorded in the southwest of the USA andsouthern Europe (Hauser 2011 Calabria et al 2012) By2015 the species had spread throughout most of theNorth-American and European continents and reached thesouthern hemisphere in Brazil (Depra et al 2014) Drosophilasuzukii represents a particularly appealing biological model forgaining further insights into the evolutionary factors associ-ated with invasion success As a closely related species of themodel insect Drosophila melanogaster its short generationtime and viability in laboratory conditions facilitate experi-mental approaches to study evolutionary events

ABC-based analyses of DNA sequences data obtained forsix X-linked gene fragments suggest that North American andEuropean populations represent separate invasion events(Adrion et al 2014) The sources of these invasions and po-tential admixture among different regions remain unclearThe wide native and introduced distribution of D suzukiiimply a potentially large number of possible sources andhence a large number of possible introduction scenarios Byfirst grouping sample size into genetic clusters the number ofpopulations treated independently can be reduced some-what (Lombaert et al 2014) but not fully Additionally thetemporal proximity of invasions in the US and Europe makesit difficult to a priori propose certain introduction scenariosover others Together these features of D suzukiirsquos invasionmean that a large number of scenarios will need to be com-pared dramatically increasing computational requirementsfor traditional ABC methods and hence making ABC-RFmethods particularly appealing

New ApproachesThe formal description of a finite set of possible introductionscenarios as models lays the foundation for ABC analyses andis outlined in Estoup and Guillemaud (2010) Choosingamong the formulated models is the central statistical prob-lem to be overcome when reconstructing routes of invasionfrom molecular data Both theoretical arguments and simu-lation experiments indicate that approximate posterior prob-abilities estimated from ABC analyses for the modeledintroduction scenarios can be inaccurate even though themodels being compared can still be ranked appropriately us-ing numerical approximation (Robert et al 2011) To over-come this problem Pudlo et al (2016) developed a novelapproach based on a machine learning tool named ldquorandomforestsrdquo (RF Breiman 2001) which selects among the complexintroduction models covered by ABC algorithms This ap-proach enables efficient discrimination among models andestimation of posterior probability of the best model whilebeing computationally less intensive

We invite readers to consult Pudlo et al (2016) to ac-cess to detailed statistical descriptions and testing of theABC random forest (ABC-RF) method Briefly randomforest (RF) is an algorithm that learns from a databasehow to predict a variable called the output from a possi-bly large set of covariates In our context the database is

the reference table which includes a given number ofdatasets that have been simulated for different scenariosusing parameter values drawn from prior distributionseach dataset being summarized with a pool of statistics(ie the covariates) RF aggregates the predictions of acollection of classification or regression trees (dependingwhether the output is categorical here the scenario iden-tity or quantitative here the posterior probability of thebest scenario) Each tree is built using the informationprovided by a bootstrap sample of the database and man-ages to capture one part of the dependency between theoutput and the covariates Based on these trees which areseparately poor to predict the output an ensemble learn-ing technique such as RF aggregates their predictions toincrease predictive performances to a high level of accu-racy in favorable contexts (Breiman 2001) RF is currentlyconsidered as one of the major state-of-art algorithm forclassification or regression

In ABC random forest (ABC-RF) Pudlo et al (2016) makestwo important advances regarding the use of summary sta-tistics and the identification of the most probable model Forthe summary statistics given a pool of different metrics avail-able ABC-RF extracts the maximum of information from theentire set of proposed statistics This avoids the arbitrarychoice of a subset of statistics which is often applied inABC analyses and also avoids what in the statistical sciencesis called ldquothe curse of dimensionalityrdquo [see Blum et al (2013)for a comparative review of dimension reduction methods inABC] With respect to identifying the most probable modelABC-RF uses a classification vote system rather than the pos-terior probabilities traditionally used in ABC analysis The firstoutcome of a ABC-RF analysis applied to a given target data-set is hence a classification vote for each competing modelwhich represents the number of times a model is selected in aforest of n classification trees The model with the highestnumber of classification votes corresponds to the model bestsuited to the target dataset among the set of competingmodels As a byproduct this step also provides a measureof the classification error called the prior error rate The priorerror rate is calculated as the probability of choosing a wrongmodel when drawing model index and parameter values intopriors The second outcome of ABC-RF is an estimation of theposterior probability of the best model that has been selectedusing a secondary random forest that regresses the modelselection error of the first-step random forest over the avail-able summary statistics

Pudlo et al (2016) shows that as compared to previousABC methods ABC-RF offers at least four advantages (i) itsignificantly reduces the model classification error as mea-sured by the prior error rate (ii) it is robust to the numberand choice of summary statistics as RF can handle manysuperfluous andor strongly correlated statistics with no im-pact on the performance of the method [see Blum et al(2013) for alternative methods of dimension reduction] (iii)the computing effort is considerably reduced as RF requires amuch smaller reference table compared with alternativemethods (ie a few thousands of simulated datasets versushundreds of thousands to millions of simulations per

Fraimout et al doi101093molbevmsx050 MBE

982Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

compared model) and (iv) it provides more reliable estima-tion of posterior probability of the selected model (ie themodel that best fit the observed dataset)

To the best of our knowledge the present study is the firstto use the ABC-RF method for an ensemble of model choiceanalyses with different levels of complexity (ie various num-ber of compared models including various and sometimeslarge number of parameters sampled populations and hencenumber of summary statistics) on real multi-locus microsat-ellite datasets Specifically we analyze molecular data at 25microsatellite loci from 23 D suzukii sample sites locatedacross most of its native and introduced range We con-ducted our population genetics study in five steps (1) Wedefined focal genetic groups and used this information toformalize 11 sets of competing introduction scenarios Wethen choose among them in a sequential ABC-RF analysis (2)We compared the performance of ABC-RF to a more stan-dard ABC method for a subset of analyses [ABC-LDA Estoupet al (2012) see ldquoMaterials and Methodsrdquo section] Wechoose ABC-LDA in our comparative study because it is con-sidered to be one of the most efficient standard ABC methodsto discriminate among models (Pudlo et al 2016) and it iswidely used in population genetics studies (Lombaert et al2014 and reference therein) (3) We refined the inferredworldwide invasion scenario by assessing the most likely or-igins of the primary introductions from the native area (4)We estimated the posterior distributions under the final in-vasion scenario of demographic parameters associated withbottleneck and genetic admixture events (5) Finally we per-formed model-posterior checking analyses to insure that thefinal worldwide invasion scenario displayed a reasonablematch to the observed dataset

Results

Origins of Invasive Genetic GroupsPrior to conducting any type of ABC analysis focal geneticgroups must be defined by characterizing genetic diversityand structure within and between all genotyped sample sitesusing traditional statistics and clustering methods This isdescribed in details in supplementary appendix S1Supplementary Material online We defined seven main ge-netic groups from our initial set of 23 sample sites (supplementary table S1 Supplementary Material online) Asia (sam-ple sites CN-Lan CN-Lia CN-Nin CN-Shi JP-Tok and JP-Sap)Hawaii (sample site US-Haw) western US (sample sites US-Wat US-Sok and US-SD) eastern US (sample sites US-ColUS-NC US-Wis and US-Gen) Europe (sample sites GE-DosFR-Par FR-Bor FR-Mon SW-Del SP-Bar and IT-Tre) Brazil(BR-PA) and La Reunion (FR-Reu)

This genetic grouping along with historical and geograph-ical information (supplementary table S1 SupplementaryMaterial online) allowed us to formalize 11 nested sets ofcompeting invasion scenarios that we analyzed sequentiallyusing ABC model choice methodologies The date thatD suzukii was first recorded in a location was used to helpformulate the scenarios For example D suzukii was first re-corded in the continental US from California (US-Wat) in

2008 and was not recorded in the central state of Colorado(US-Col) until 2012 US-Wat could therefore serve as a sourcefor US-Col but not vice versa We present the main analysesin table 1 In supplementary table S2 Supplementary Materialonline a more detailed description of each competing sce-narios considered for each analysis is provided The 11 se-quential ABC analyses permit step-by-step reconstructionof the introduction history of D suzukii worldwide Resultsfor each analysis based on prior set 1 (which used boundeduniform distributions of parameters see ldquoMaterials andMethodsrdquo section) and a single set of sample sites represen-tative of the genetic groups defined above are given in table 2For all 11 analyses similar ABC-RF results were obtained usingthe prior set 2 (which used more peaked distributions supplementary table S3 Supplementary Material online) andconsidering various sets of representative sample sites (supplementary table S4 Supplementary Material online)

By choosing among different competing scenarios eachanalysis identifies the most probable source(s) for a givenintroduced population The results come in the form of dif-fering levels of support for the competing introduction mod-els It is thus worth stressing here that the identification of themost probable source population X (from among a finite setof possible source populations) does not necessarily meanthat the introduced population Y originated from exactlylocation X but rather that the most probable origin of thefounders of the invasive population sampled at site Y is apopulation genetically similar to the source population sam-pled at site X However for sake of concision we will usehereafter the simplified terminology that the most probableorigin of Y is X

After the early invasion of Hawaii (1980) the first recordedintroduction was to the genetically heterogeneous westernUS and thus that is where our main analyses start (analyses1andash1d table 2) We considered the three western US samplesites in separate analyses and we found in each case that thebest scenario included genetic admixture between Asia andHawaii as sources (table 2 analysis 1andash1c mean posteriorprobability of the best scenario Pfrac14 0999) We subsequentlytested if this common admixture pattern resulted from asingle or multiple introduction events from Asia andorHawaii including the possibility of local colonization eventsamong western US sites (analysis 1d) The best invasion sce-nario for the western US included an initial admixture eventwith Asia and Hawaii as probable sources in northernCalifornia (site US-Wat admixture event A1 in fig 1) followedby (i) a local colonization southward into San Diego (US-SD)and northward into Oregon state (US-Sok) and (ii) a second-ary introduction event from Hawaii to Oregon (US-Sok) (A2event in fig 1 Pfrac14 0690) This result is supported by variousgenetic clustering results (fig 1 supplementary figs S1 and S2Supplementary Material online) as well as raw F-statisticsvalues which indicate high differentiation level between US-Sok and other continental US sample sites and a substantiallylower FST between US-Sok and Hawaii (supplementary tableS5 Supplementary Material online) Once the invasion historywas resolved for the western US group we inferred the mostprobable source(s) of the invasive eastern US and European

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

983Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

genetic groups using two independent analyses (analyses 2aand 2b table 1 and supplementary table S2 SupplementaryMaterial online) We found evidence for a single origin for theeastern US group corresponding to an intra-continentalspread from the San Diego area (Pfrac14 0779 table 2)

The most probable origin for Europe was an independentintroduction from the Asian native range (Pfrac14 0716 table 2)The European and North American invasions occurred nearlysimultaneously and thus we also explored gene flow betweenthe two regions Europe was never selected as a source for theeastern US (analysis 3a Pfrac14 0744) no matter which sets ofsample sites were considered In contrast we found someevidence of asymmetrical gene flow from the eastern USinto at least some European locations When consideringthe northern European sample site from Germany (GE-Dos) as a target we found that the best scenario includedan admixed origin with Asia and the eastern US (supplementary table S4 Supplementary Material online analysis 3bPfrac14 0488 and Pfrac14 0501 for prior set 1 and 2 respectively)In contrast the best scenario did not include such admixtureevent when considering the southern European sample sitefrom Italy (IT-Tre) as a target (table 2 analysis 3b Pfrac14 0510)We therefore replicated analysis 3b on all possible combina-tions of European and eastern US sample sites to assess therobustness of this result and evaluate the possibility of a geo-graphic pattern for admixture events For each of theEuropean sample sites figure 2 summarizes which replicateanalyses selected a scenario including an admixture betweeneastern US and Asia or a scenario of a single introductionfrom Asia (and see supplementary fig S3 SupplementaryMaterial online for results using an alternative representativesample site for the native range) Results tended to indicate

an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north-south gradient More specifically eastern US alleles were pre-sent in northern Europe and absent or at least not detectedby our model choice method in southern Europe We thenfurther tested whether the observed admixture in the northcorresponded to the mixing of eastern US genes with south-ern European genes originating from the initial introductionevent from Asia versus the mixing of eastern US genes withAsian genes introduced through a separate secondary intro-duction event from Asia in northern Europe (analysis 4) Thebest scenario was clearly that of an admixture between east-ern US genes and southern European genes which them-selves had a probable origin as part of the initialintroduction event from Asia (Pfrac14 0998 A3 event in fig 1)

Finally we found that the sampled population from Braziloriginated from North America with genetic admixture be-tween D suzukii individuals from the southwest and easternregions of the US (analysis 5a Pfrac14 0631 A4 event in fig 1)Regarding the most recent invasive population from LaReunion we found that this island population originatedfrom Europe with admixture between individuals fromnorthern and southern Europe (analysis 5b Pfrac14 0500 A5event in fig 1)

Comparison of Model Choice Analyses Using ABC-RFversus ABC-LDAFor all 11 analyses ABC-LDA performed well when using largereference tables (ie 500000 simulations per scenario table 2and supplementary table S3 Supplementary Material online)Indeed the prior error rates for ABC-LDA with large referencetables were smaller than the prior error rates for ABC-RF with

Table 1 Formulation of the Model Choice Analyses that were Carried Out Successively to Reconstruct Invasion Routes of D suzukii Using ABC

ModelChoiceAnalysis

Number ofComparedScenarios

Tackled Question Potential source genetic group Focal populations

1a 3 What are the origins of western US populations Asia Hawaii US-Wat1b US-Sok1c US-SD1d 7 What are the relations among western US popu-

lations and their extra-continental sourcesAsia Hawaii US-Wat US-Sok

US-SDUS-Wat thorn US-Sokthorn US-SD

2a 6 What are the origins of eastern US populations Asia Hawaii western US eastern US2b 6 What are the origins of European populations Asia Hawaii western US Europe3a 10 Is there asymmetrical gene flow from Europe to

eastern USAsia Hawaii western US Europe eastern US

3b 10 Is there asymmetrical gene flow from eastern USto Europe

Asia Hawaii western US eastern US Europe

4 2 Does the admixture with eastern US genes innorthern Europe result from a secondary Asianintroduction

Asia Hawaii western US eastern USsouthern Europe

northern Europe

5a 21 What are the origins of the Brazilian population Asia Hawaii western US eastern USsouthern Europe northern Europe

Brazil

5b 21 What are the origins of La Reunion population Asia Hawaii western US eastern USsouthern Europe northern Europe

La Reunion

NOTEmdash Each numbered analysis is a comparison of a certain number of scenarios by ABC model choice We summarize each analysis by stating the question that it addressed Adetailed verbal description of each compared scenario is given in supplementary table S2 Supplementary Material online The 11 analyses are nested in the sense that eachsubsequent analysis use the result obtained from the previous one For example in Analysis 2a ldquoWhat are the origins of eastern US populationsrdquo capitalizes on the historyinferred for western US populations from analysis 1d ldquoPotential source genetic grouprdquo indicates all the potential source populations considered in the analyses for which onewants to identify the origin of the focal population (ie the target)

Fraimout et al doi101093molbevmsx050 MBE

984Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Tab

le2

Res

ult

so

fM

od

elC

ho

ice

An

alys

esU

sin

gA

BC

-RF

and

AB

C-L

DA

Pri

or

erro

rra

teP

ost

erio

rP

rob

abili

tyo

fth

eb

est

Mo

del

Ori

gin

of

the

foca

lp

op

ula

tio

nu

sin

gei

ther

AB

C-R

Fo

rA

BC

-LD

A(i

e

bes

tm

od

el)

An

alys

isT

ota

lNu

mb

ero

fSc

enar

ios

Nu

mb

ero

fSu

mS

tats

A

BC

-RF

(sd

)A

BC

-LD

A(l

arge

refe

ren

ceta

ble

)

AB

C-L

DA

(sm

all

refe

ren

ceta

ble

)

AB

C-R

F(s

d)

AB

C-L

DA

(lar

gere

fere

nce

tab

le)

[CI]

1a3

390

100

(60

002)

007

50

085

100

0(6

000

1)1

000

[10

001

000

]A

siathorn

Haw

aii

1b3

009

4(6

000

1)0

070

009

10

989

(60

005)

099

9[0

999

09

99]

1c3

009

6(6

000

1)0

079

008

70

998

(60

002)

099

9[0

999

09

99]

1d7

130

032

7(6

000

1)0

274

036

80

690

(60

018)

074

1[0

714

07

67]

Asi

athorn

Haw

aii

2a6

130

023

2(6

000

1)0

203

024

10

779

(60

019)

078

8[0

764

08

13]

Wes

tern

US

2b6

130

023

2(6

000

1)0

179

039

30

716

(60

013)

060

4[0

592

06

16]

Asi

a

3a10

204

032

8(6

000

1)0

265

036

40

744

(60

020)

084

4[0

820

08

68]

Wes

tern

US

3b10

204

040

8(6

000

1)0

358

042

30

510

(60

014)

040

9[0

391

04

26]

Asi

a

42

301

012

1(6

000

1)0

111

021

21

000

(60

009)

099

9[0

999

09

99]

Sou

ther

nEu

rop

ethorn

east

ern

US

5a21

424

029

4(6

000

1)0

230

042

30

631

(60

034)

081

2[0

788

08

37]

Wes

tern

USthorn

east

ern

US

5b21

424

029

8(6

000

1)0

242

042

80

500

(60

027)

029

0[0

256

03

23]

No

rth

ern

Euro

pethorn

sou

ther

nEu

rop

e

NO

TEmdash

All

mo

del

cho

ice

anal

yses

wer

eca

rrie

do

ut

usi

ng

the

pri

or

set

1(s

up

ple

men

tary

tab

leS8

Su

pp

lem

enta

ryM

ater

ialo

nlin

e)an

da

sin

gle

set

ofs

amp

lesi

tes

rep

rese

nta

tive

oft

he

pre

-defi

ned

gen

etic

gro

up

sSa

mp

lesi

teJP

-To

kfo

rth

en

ativ

eA

sian

gro

up

US-

Haw

fort

he

Haw

aiia

ngr

ou

pU

S-W

atU

S-So

kan

dU

S-SD

fort

he

wes

tern

US

gro

up

US-

NC

fort

he

east

ern

US

gro

up

IT

-Tre

fort

he

sou

ther

nEu

rop

egr

ou

pG

E-D

os

fort

he

no

rth

ern

Euro

pe

gro

up

BR

-PA

fort

he

Bra

zilg

rou

pa

nd

FR-

Reu

for

the

LaR

eun

ion

gro

up

(fig

1)D

atas

ets

wer

esu

mm

ariz

edu

sin

gth

ew

ho

lese

to

fsu

mm

ary

stat

isti

csp

rop

ose

db

yD

IYA

BC

(Co

rnu

etet

al2

014)

Th

eto

taln

um

ber

ofs

um

mar

yst

atis

tics

(Nu

mb

ero

fSu

mS

tats

)as

wel

las

the

tota

lnu

mb

ero

fco

mp

ared

scen

ario

s(T

ota

lnu

mb

ero

fsce

nar

ios)

are

ind

icat

edfo

reac

han

alys

isP

rio

rerr

orr

ates

and

po

ster

iorp

rob

abili

ties

oft

he

bes

tmo

del

cho

sen

usi

ng

AB

C-R

Fw

ere

aver

aged

ove

r10

rep

licat

ean

alys

esA

BC

-RF

and

AB

C-L

DA

trea

tmen

tsyi

eld

edth

esa

me

bes

tm

od

elch

oic

efo

ral

lan

alys

esan

dis

den

ote

din

the

colu

mn

ldquoOri

gin

oft

he

foca

lpo

pu

lati

on

usi

ng

eith

erA

BC

-RF

or

AB

C-L

DA

(ie

b

est

mo

del

)rdquoA

dm

ixtu

reb

etw

een

two

sou

rce

po

pu

lati

on

sar

ere

pre

sen

ted

by

aldquothorn

rdquosi

gnA

BC

-LD

Ap

ost

erio

rpro

bab

iliti

eso

fth

eb

est

mo

del

sw

ere

esti

mat

edu

sin

gldquol

arge

refe

ren

ceta

ble

srdquow

ith

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioA

BC

-LD

Ap

rio

rerr

orr

ates

wer

eco

mp

ute

du

sin

gre

fere

nce

tab

les

oft

wo

diff

eren

tsi

zes

ldquolar

gere

fere

nce

tab

lerdquo(

ie

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

io)

and

ldquosm

allr

efer

ence

tab

lerdquo

(ie

10

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioas

for

AB

C-R

Fan

alys

es)

SD

sta

nd

sfo

rst

and

ard

dev

iati

on

ove

r10

rep

licat

ean

alys

esan

dC

Ifo

r95

co

nfi

den

cein

terv

alco

mp

ute

dfo

llow

ing

Co

rnu

etet

al(

2008

)

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

985Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

small reference tables (paired t-test t10frac1427 Pfrac14 002)However ABC-RF performed better than ABC-LDA whenusing small reference tables (only 10000 simulations per sce-nario) having lower prior error rates (paired t-test t10frac14116Plt 00001 table 2 and supplementary table S3Supplementary Material online) The difference in prior errorrate was particularly marked for the complex analyses iethose in which many introduction scenarios were comparedand many summary statistics were computed For instancefor the analyses 5a and 5b (21 compared scenarios and 424summary statistics computed) prior error rates were 0217and 0221 for ABC-RF versus 0417 and 0425 for ABC-LDA

Regarding computational effort we found that for a givenobserved dataset an ABC-RF treatment to select the bestscenario and compute its posterior probability required a

ca 50 times shorter computational duration than when pro-cessing a ABC-LDA treatment with a traditional referencetable of large size (ie 500000 simulated datasets per sce-nario) Moreover estimation of prior error rates with ABC-RF took only a few additional minutes of computation timefor 10000 pseudo-observed datasets whereas with ABC-LDAit lasted several hours to several days (depending on theanalysis processed) for 500 pseudo-observed datasets

The same invasion scenario had the highest probability forall 11 analyses carried out using ABC-RF and ABC-LDA withlarge number of simulated datasets (table 2 and supplementary table S3 Supplementary Material online) Posterior prob-abilities of the best scenario estimated using ABC-RF were notsystematically higher (or lower) than those using ABC-LDAwith large number of simulated datasets In particular there

FIG 1 Worldwide invasion scenario of D suzukii inferred from microsatellite data and date of first observation Map and schematic showingsample sites and the invasion routes taken by D suzukii as reconstructed by ABC-RF (Pudlo et al 2016) on a total of 685 individuals from 23geographic locations genotyped at 25 microsatellite loci (see results and methods for details) The native range is in dark grey and the invasiverange is in light-gray (cf delimitation from fig 1 in Asplen et al 2015) The year in which D suzukii was first observed at each sample site is indicatedin italics The 23 geographical locations that were sampled are represented by circles (native range) and squares diamonds and triangles(introduced range) Squares indicate populations that experienced weak bottlenecks (ie median value of bottleneck severity lt 012 seemain text and table 3) diamonds indicate moderate bottlenecks (012 lt bottleneck severity lt 022) and triangles indicate strong bottlenecks(ie bottleneck severitygt 03) The colors of the symbol for the sample sites and arrows between them correspond to the different genetic groupsobtained using the clustering method BAPS (supplementary appendix S1 Supplementary Material online) The arrows indicate the most probableinvasion pathways A1ndashA5 indicate five separate admixture events between different sources O1ndashO3 indicate the most probable sources withinthe native range for the primary introduction events A1frac14 Hawaiithorn southeast China A2frac14Watsonville (western US)thorn Hawaii A3frac14 southernEurope thorn eastern US A4 frac14 western US thorn eastern US A5 frac14 southern Europe thorn northern Europe O1 frac14 Japan O2 frac14 southeast China O3 frac14northeast China

Fraimout et al doi101093molbevmsx050 MBE

986Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

was no clear trend of overestimation of posterior probabilitywhen using ABC-LDA with large number of simulated data-sets versus ABC-RF a potential bias suggested by Pudlo et al(2016) On the other hand we found evidence of instability inthe estimation of posterior probabilities and hence modelselection when using ABC-LDA with a small numbers of sim-ulated datasets in the reference table (ie 10000 simulationsper scenario as for ABC-RF) In this case the scenario with thehighest probability was different from that selected using ei-ther ABC-LDA with the large reference table or ABC-RF infour and two of the 11 analyses when using the prior sets 1and 2 respectively (results not shown)

Refining the Origins of the Primary IntroductionEvents from the Native AreaAdditional ABC-RF analyses were run to determine whichgeographical area in the native range was the most likelyorigin of each of the three primary introductions (ie inHawaii western US and Europe) The most probable originof the Hawaiian invasive population was Japan with a mean

posterior probability of 0959 (origin O1 in fig 1) ForWatsonville (western US sample site US-Wat) the mostprobable source of the primary introduction from Asia wassoutheast China (origin O2 in fig 1 Pfrac14 0860) Finally forEurope the most probable source of the primary introduc-tion from Asia was northeast China (origin O3 in fig 1Pfrac14 0855) Similar posterior probabilities were obtained usingthe prior set 1 (values presented above) and the prior set 2(results not shown) For inferences regarding Europe similarposterior probabilities were also obtained using various sets ofsample sites from Europe and the eastern US (results notshown)

Estimation of Parameter Distributions for AdmixtureRate and Bottleneck SeverityWe used the most probable worldwide invasion scenario (fig1) to estimate the posterior distributions of parameters re-lated to bottleneck severity associated to the foundation ofnew populations and genetic admixture rates between differ-entiated sources (ie A1ndashA5 events in fig 1)

FIG 2 Admixed or non-admixed origin of D suzukii in Europe inferred for each sample sites Potential source populations include (among others)the Asian native range (represented by the Japanese sample site JP-Tok) and the invasive genetic group from eastern US represented by one of thefour invasive sample sites collected in this area (fig 1) Four replicate independent ABC-RF treatments corresponding to the analysis 3b (table 2)were hence carried out for each targeted European sample site using one of the four eastern US sample site The treatments labeled 1 2 3 and 4 inthe pies of the figure have been carried out with the sample sites US-NC US-Wis US-Gen and US-Col respectively A pie quarter in blue indicatesthat the best scenario corresponds to a single introduction event from Asia A pie quarter in red indicates that the best scenario corresponds to anadmixture event between Asia and eastern US Dates in italic correspond to the dates of first record of the European sample sites

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

987Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

The posterior distributions parameters relating to bottle-necks and admixture substantially differed from the priorsindicating that genetic data were informative for such param-eters (tables 3 and 4 and see supplementary tables S6 and S7Supplementary Material online for results using an alternativeset of representative sample sites) Regarding bottleneck se-verity we found that bottlenecks tended to be more severefor populations founded by individuals originating from asource located on a different continent (extra-continentalorigin) than for populations founded by individuals originat-ing from a source located on the same continent (intra-con-tinental origin table 3 and fig 1) The bottleneck severity forpopulations founded by individuals corresponding to a com-bination of the two types of sources (extra- and intra-continental origins) was either weak or moderate The stron-gest bottleneck severity was found for the first invasive

population in Hawaii which showed the lowest level of ge-netic variation among all invasive populations Bottleneckseverity values were globally slightly stronger in Europeanthan in US populations suggesting a smaller number offounding individuals andor a slower demographic recoveryin European populations

Regarding admixture (table 4) we found that (i) for thefirst recorded invasive population in western US (ieWatsonville sample site US-Wat) the genetic contributionfrom China was substantially larger than that from Hawaii(median value of 0759) (ii) for the western US populationUS-Sok the secondary genetic contribution from Hawaii wassmall compared to that from Watsonville (median value of0237) and (iii) the sample sites from northern Europe (egGE-Dos in Germany) contained a rather low proportion ofgenes originating from the eastern US (median value of

Table 3 Bottleneck Severity in Invasive Populations of D suzukii

Prior 1 Prior 2Introduction Type Sample Site Mean Median Mode q5 q95 Mean Median Mode q5 q95

Extra continental US-Haw 0517 0500 0495 0326 0755 0490 0484 0477 0382 0615US-Wat 0181 0138 0114 0041 0448 0143 0127 0118 0059 0275IT-Tre 0268 0179 0171 0059 0837 0236 0189 0172 0101 0553BR-PA 0158 0126 0104 0020 0407 0208 0193 0172 0067 0399FR-Reu 0259 0215 0186 0051 0626 0245 0214 0190 0069 0534

Intra-continental US-SD 0135 0117 0106 0039 0283 0117 0104 0091 0053 0224US-NC 0089 0067 0061 0015 0230 0111 0098 0090 0046 0218

Extra thorn intracontinental

US-Sok 0116 0099 0088 0027 0250 0119 0106 0099 0052 0229GE-Dos 0189 0177 0155 0061 0356 0205 0189 0171 0088 0372

Prior values 0628 0199 NA 0021 1904 0517 0500 NA 0326 0754

NOTEmdash Extra-continental introductions correspond to a long distance introduction from a source located apart from the continent of the focal population intra-continentalintroduction corresponds to an introduction event from a source located on the same continent than the focal population and Extrathorn Intra continental introductioncorresponds to a combination of the two types of sources Mean median and mode estimates as well as bounds of 90 credibility intervals (q5 and q95) are indicated foreach bottleneck severity parameter We roughly classified the estimated bottleneck severity values into three classes (represented here by the three shades of gray) weak (iemedian value of bottleneck severitylt 012 in light gray) moderate (012lt bottleneck severitylt 022 in gray) and strong (ie bottleneck severitygt 03 in dark gray) The set ofsample sites used for the ABC estimations presented here include (i) for the native area Japan (JP-Tokthorn JP-Sap) South-East China (CN-Nin) and North-East China (CN-LanthornCN-Lia) and (ii) for the invaded range US-Wat US-Sok and US-SD for western US US-NC for eastern US IT-Tre for southern Europe GE-Dos for northern Europe BR-PAfor South-America (Brazil) and FR-Reu for La Reunion island Code names of the sample sites are the same as in fig 1 and supplementary table S1 Supplementary Materialonline in which bottleneck severity classes are also given for each sample site See supplementary table S6 Supplementary Material online for results on a different set ofrepresentative sample sites

Table 4 Posterior Distributions of Admixture Rates for the Five Admixture Events Inferred for the Final Worldwide Invasion Scenario Described infigure 1

Admixture Event Admixture Rate (gene fraction from pop x) Mean Median Mode q5 q95

Prior 1 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0759 0761 0774 0659 0854A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0237 0230 0227 0092 0400A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0286 0278 0266 0112 0481A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0467 0461 0442 0187 0754A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0388 0380 0426 0132 0670

Prior 2 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0761 0762 0766 0684 0833A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0224 0219 0199 0114 0350A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0312 0306 0284 0152 0487A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0422 0418 0429 0247 0613A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0370 0368 0356 0178 0569

NOTEmdash Admixture events are denoted as in figure 1 Each admixture rate parameter r points to the name of the admixed site and corresponds to the fraction of genesoriginating from the source site in parentheses (1 r genes originate from the other source site) Mean median and mode estimates as well as bounds of 90 credibilityintervals (q5 and q95) are indicated for each admixture parameter Estimations assuming the prior set 1 and the prior set 2 (supplementary table S8 Supplementary Materialonline) are provided The set of representative sample sites used for the ABC estimations presented here is the same than in table 3 Code names of the population sites are thesame as in fig 1 and supplementary table S1 Supplementary Material online See supplementary table S7 Supplementary Material online for results on a different set ofrepresentative sample sites

Fraimout et al doi101093molbevmsx050 MBE

988Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

0286) We found more balanced admixture rates for thepopulations from Brazil (BR-PA) and La Reunion (FR-Reu)but information on this parameter was less accurate in thesecases as indicated by large 90 credibility intervals

Model-Posterior CheckingLike any model-based methods ABC inferences do not revealthe ldquotruerdquo evolutionary history but allows choosing the bestamong a necessarily limited set of scenarios that have beencompared and to estimate posterior distributions of param-eters under this scenario How well the inferred scenario-posterior combination matches with the observed datasetremains to be evaluated using an ABC model-posterior check-ing analysis When applying such an analysis on the final in-vasion scenario detailed in figure 1 we found that whenconsidering the prior set 1 only 54 of the 1141 summarystatistics used as test quantities had low posterior predictiveP-values (ie 0002lt ppp-valueslt 5) Moreover none ofthose ppp-values values remained significant when correctingfor multiple comparisons (Benjamini and Hochberg 1995)Similar results were obtained when assuming the prior set 2and when considering different sets of representative samplesites in our analyses (results not shown) These findings showthat the final worldwide invasion scenario (fig 1 with associ-ated parameter posterior distributions) matches the observeddataset well In agreement with this the projections of thesimulated datasets on the principal component axes from thefinal model-posterior combination were well grouped andcentered on the target point corresponding to the observeddataset (supplementary fig S4 Supplementary Material on-line) Due to the modest size of the dataset (ie 25 microsat-ellite loci) however only situations of major inadequacy ofthe model-posterior combination to the observed dataset arelikely to be identified

DiscussionIn the present paper we decipher the routes taken byD suzukii in its invasion worldwide evaluating evidence ofbottlenecks and genetic admixture associated with differentintroductions and we quantify the efficiency of ABC-RF rel-ative to the more standard ABC-LDA method for choosingamong introduction scenarios

A Complex Worldwide Invasion HistoryPrior to this study and that of Adrion et al (2014) our un-derstanding of the worldwide introduction pathways ofD suzukii was based on historical and observational datawhich were incomplete and potentially misleading Our re-sults indicate three distinct introductions from the nativerangemdashto Hawaii the western side of North America andwestern Europe which accords well to findings of Adrionet al (2014) from a different set of markers (X-linked se-quence data) Both the western North American introduc-tions and at least some European populations show signs ofadmixture Hawaii and China both appear to have contrib-uted to the western North American introductions whichwas in turn the most probable source for the introductioninto eastern North America Similarly China and to a lesser

extent eastern North America both appear to have contrib-uted to the European introduction but mostly in northernEuropean areas with respect to the eastern North Americancontribution The almost simultaneous invasion of NorthAmerica and Europe from Asia could be due to increasedtrade between these areas facilitating transport of this speciesbetween regions Alternatively or in addition adaptation tohuman-altered habitats (in this case changes in agriculturalpractices) within the native range of the species could havepromoted invasion to similarly altered habitats worldwide(ie anthropogenicaly induced adaptation to invadeHufbauer et al 2012) However the two invaded continentswere most likely colonized by flies from distinct Chinese geo-graphic areas (fig 1) requiring that adaptation to agricultureoccurred concomitantly at several locations within the nativerange This is possible but not evolutionarily parsimoniousand further data would be required to test this explanationSubsequent introductions from the three primary founderlocations to other locations result in a complex history ofinvasion Both primary and secondary introductions showsigns of demographic bottlenecks and several invasive pop-ulations have multiples sources and thus experienced geneticadmixture between differentiated native or invasivepopulations

The genetic relationships between worldwide D suzukiipopulations define the genetic variation they harbor andmay continue to shape further spread of alleles among pop-ulations Gene flow among populations through continuousdispersal or more punctual admixture events permits thedissemination of allelic variants and new mutations withdramatic consequences for evolutionary trajectories(Lenormand 2002) We found evidence for asymmetric ge-netic admixture from North American towards (northern)European populations If this signal of admixture reflects re-current and on-going dispersal events rather than a singlepast secondary introduction event then variants that arisein North America would be more likely to spread to Europethan the reverse Discriminating between ongoing and pastdispersal events is tricky however especially for recent inva-sions (Benazzo et al 2015) If the occurrence of recurrent geneflow is confirmed then D suzukii populations from Europemay not lag behind American ones in evolutionary termsFrom an applied perspective if resistance to control practicesemerges in North America special efforts to stop the impor-tation of D suzukii to Europe would help prevent the evolu-tion of resistance in Europe and thus reduce damage to crops(Caprio and Tabashnik 1992)

Advances in Methods for Discriminating amongComplex ModelsTo the best of our knowledge the present study is the first touse the recently developed ABC-RF method (Pudlo et al2016) to test competing models characterized by differentlevels of complexity To compare ABC-RF to a more standardABC method we carried out a subset of model choice anal-yses using both ABC-RF and ABC-LDA (Estoup et al 2012)We ran ABC-LDA analyses in two ways with standard sizedlarge reference tables (500000 simulated datasets per

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

989Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

scenario) and with small reference tables (10000 simulateddatasets per scenario the same size as used for ABC-RF) Theperformance of ABC-LDA (in terms of prior error rate) inchoosing among introduction scenarios was slightly betterthan for ABC-RF when using large reference tablesFurthermore for ABC-LDA with large reference tables confi-dence in model choice (in terms of posterior probability) wasrelatively comparable to ABC-RF with small reference tablesThus when analyses are relatively simple or computationalresources are not limiting ABC-LDA can provide as robustinferences of invasion scenarios as ABC-RF

In contrast we found that ABC-RF out-performed ABC-LDA when using similarly small reference tables for both typesof analyses (10000 simulated datasets per scenario) ABC-LDA also was unstable choosing different scenarios in differ-ent replicate analyses when using small reference tablesThese empirical findings correspond well with those ofPudlo et al (2016) using simple models for which the trueposterior probabilities could be calculated Pudlo et al (2016)demonstrated that ABC-RF provided more reliable posteriorprobability of the best (true) scenario than standard ABCmethods when using the same (small) number of simulateddatasets in the reference table (see supplementary fig S3 inPudlo et al 2016) In our work the difference in performancebetween the two methods was most pronounced for com-plex analyses (eg when comparing more than 10 scenariosusing more than 100 summary statistics) Thus when analysesare not simple or computational resources are finite ABC-RFprovides more robust inferences of invasion scenarios thanABC-LDA

The lower computation cost of ABC-RF was also evidentFor a given observed dataset selecting the best supportedscenario and computing its posterior probability was ca 50times faster than when processing an ABC-LDA treatmentwith a traditional reference table of large size Moreover es-timation of prior error rates with ABC-RF took only a fewadditional minutes of computation time whereas it lastedseveral hours to several days with ABC-LDA The conse-quence of low computational costs is not simply in computertime Rather the efficiency of ABC-RF analyses allowed us torun replicate analyses on various sample sets even for themost complex D suzukii invasion scenarios This replication iscritical in evaluating the robustness of our statisticalinferences

We found that at least for some ABC-RF analyses theposterior probability of the best scenario and the global sta-tistical power to choose among alternative invasion scenarios(as measured by the prior error rate) were relatively low Forinstance in three of the 11 analyses (analyses 3b 5a and 5b)the best scenarios had relatively low probabilities (ie rangingfrom 0500 to 0630) and relatively high prior error rates (ieranging from 0300 to 0400) Although replicate analyses car-ried out on different sample sets and prior sets pointed to thesame best scenario for these three analyses such low proba-bility values and high prior error rates suggest a moderatelevel of confidence in the choice of model in these casesSimilarly we found that the scenario choice leading to theconclusion of the presence versus absence of admixed genes

from the eastern US in various European sample sites reliedon probability values that were sometimes as low as 0500Moreover in a minority of cases the best scenario was differ-ent depending on the sample site considered as representa-tive of the eastern US or the native area The observed patternof an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north tosouth gradient should hence be taken cautiously The anal-yses of additional European sample sites are needed to clarifythis issue The above inferential uncertainties also most likelyfind their sources in the relatively small number of markersgenotyped relative to the number complexity and similarityof some of the competing models Such results indicate thatthere is room for improving the robustness of our inferencesat least for a subset of our analyses

Bottlenecks and Genetic AdmixtureWe found strong evidence that bottlenecks and admixtureboth occurred during the course of the invasion of D suzukiiA first sign of bottlenecks is a reduction in neutral geneticdiversity We found that all invasive D suzukii populationswere characterized by significantly lower genetic variationthan native ones with a loss of 64 (US-Col) to 233(US-Haw) of heterozygosity and of 273 (US-Wat) to542 (US-Haw) of allelic diversity respectively (supplementary fig S5 Supplementary Material online) Adrion et al(2014) find a similar significant loss of diversity in their datasetfocused on gene sequences Dlugosch and Parker (2008) andUller and Leimu (2011) reviewed studies of neutral geneticdiversity in a large number of species of animals plants andfungi and compared nuclear molecular diversity within intro-duced and source populations Overall they found that a lossof variation was the most frequent feature in invasive popu-lations However reductions in genetic variation were on av-erage modest (eg average loss of 187 and 155 ofheterozygosity and allelic diversity respectively Dlugoschand Parker 2008) The reductions observed in our study ininvasive D suzukii populations were thus in the same rangefor heterozygosity but higher for allelic diversity (cf averageloss of 111 and 345 of heterozygosity and allelic diversityrespectively) A larger loss of allelic diversity than heterozy-gosity after a bottleneck is nevertheless expected by theory(Nei et al 1975)

The type of introduction pathway explains at least partlythe severity of the reduction in genetic diversity Bottleneckstended to be less severe for populations founded by individ-uals from the same continent than for populations foundedby individuals originating from a different continent A simpleexplanation for such pattern is that due to greater proximitypopulations originating from the same continent are likely tobe initially founded by a larger numbers of individuals andalso to experience recurrent gene flow Populations foundedby individuals from both the same and different continentsexperienced weak or moderate reductions in diversity sug-gesting that multiple introduction pathways restore diversity

In agreement with previous studies on invasions on simi-larly large geographical scales (eg Lombaert et al 2014 andreference therein) we found that admixture events were

Fraimout et al doi101093molbevmsx050 MBE

990Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

frequent in the worldwide invasion history of D suzukii Weidentified at least five admixture events for which sourceswere native andor invasive populations (ie A1ndashA5 eventsin fig 1) The genetic contributions of the sources were un-balanced except for the populations from Brazil and LaReunion but genetic information regarding admixture wasconsiderably less accurate in the latter cases which involvedweakly differentiated source populations

We have provided here the information necessary to eval-uate whether bottlenecks and admixture have influencedoutcomes in this invasion Quantitative genetics studies inthe lab focusing on the key source bottlenecked and ad-mixed populations could now examine the fitness and adap-tive potential of those groups (Lavergne and Molofsky 2007Facon et al 2008 Turgeon et al 2011)

Conclusions and PerspectivesUnderstanding invasion pathways provides key informationfor understanding the role of evolutionary processes in bio-logical invasions For example only does our research revealsources of invasive populations for further relevant compar-isons using quantitative genetics studies in the labAdditionally we found that the invasion of Europe by Dsuzukii was distinct from that of North America with limitedand asymmetrical gene flow between these two main invadedareas This situation provides the opportunity to evaluateevolutionary trajectories in replicate into two separate tem-perate climate regions similarly to research by Gilchrist et al(2001 2004) on the pace of clinal evolution in the invasivefruit fly Drosophila subobscura There are also distinct invasionpathways for areas with warmer climates (ie Hawaii LaReunion and Brazil) that will make interesting comparisons

Retracing invasion routes and making inferences aboutdemographic processes is only possible if there is adequatepolymorphism within populations and significant genetic dif-ferentiation among them As is evident here microsatellitedata can provide enough variation to reveal important path-ways We concur with Adrion et al (2014) however thatgenome-wide data (ie next generation sequencingapproaches) will be tremendously powerful in further dis-criminating among complex invasion scenarios (see alsoCristescu 2015 Estoup et al 2016) Because of the reducedcomputational resources demanded by ABC-RF this methodwill be particularly useful for analysis of massive single nucle-otide polymorphism datasets (Pudlo et al 2016) In additionto the selectively neutral demographic inference leading tothe reconstruction of routes of invasion population (andquantitative) next generation sequencing approaches arequite promising for studying the evolution of phenotypictraits in natural populations (Wray 2013 Gautier 2015)Specifically they can be used to better understand the geneticarchitecture of traits underlying invasion success (Bock et al2015) Drosophila suzukii is a good species for such researchgiven (i) its short generation time and viability in laboratoryconditions which facilitate experimental approaches to studyquantitative traits of interest (Asplen et al 2015) and (ii) theavailability of annotated genome assemblies for this species

(Chiu et al 2013 Ometto et al 2013) along with the hugeamount of genomic resources available in its close relativespecies D melanogaster (Groen and Whiteman 2016)

Materials and Methods

Sampling and GenotypingAdult D suzukii were sampled from the field at a total of 23localities (hereafter termed sample sites) distributed through-out most of the native and invasive range of the species(supplementary table S1 Supplementary Material onlineand fig 1) Samples were collected between 2013 and 2015using baited traps and sweep nets and stored in ethanolNative Asian samples consisted of a total of six sample sitesincluding four Chinese and two Japanese localities Samplesfrom the invasive range were collected in Hawaii (1 samplesite) Continental US (7 sites) Europe (7 sites) Brazil (1 site)and in the French island of La Reunion (1 site) For eachsample site 15ndash44 adult flies were genotyped at 25 of the28 microsatellite loci described in Fraimout et al (2015) re-sulting in a total of 685 genotyped individuals Three micro-satellite loci from Fraimout et al (2015) (ie loci DS31 DS42and DS45) were not included due to the presence of nullalleles at frequenciesgt 10 in some sample sites (resultsnot shown) DNA extraction and PCR amplification as wellas allele scoring were performed following the methodologydetailed in Fraimout et al (2015)

General Framework for Defining Focal PopulationGroups and Sets of Scenarios to Be ComparedUsing ABCOne of the many challenges of ABC analyses is to optimallydefine sets of sample sites to be processed chronologically aspotential sources and targets in a way that do not hamper thecomputational effort As stated in Estoup and Guillemaud(2010) and Lombaert et al (2014) the number and complex-ity of competing scenarios is proportional to the number ofgenetic groups to be accounted in ABC analyses FollowingLombaert et al (2014) we used the results obtained fromthree genetic clustering methods along with historical andgeographical information to define genetic groups that couldinclude several sample sites (supplementary appendix S1Supplementary Material online) We defined seven main ge-netic groups from our set of 23 sample sites Asia Hawaiiwestern US eastern US Europe Brazil and La Reunion Tomake the computations less intensive with respect to com-puter resources ABC treatments were carried out using onerepresentative sample site for each genetic group per replicateanalysis In other words when a genetic group included morethan two sample sites we considered the two most differen-tiated sample sites (ie with the highest FST values in supplementary table S3 Supplementary Material online) as repre-sentative of the group CN-Shi and JP-Tok for the group Asiaall three sample sites for the genetically heterogeneousWestern US group US-NC and US-Wis for the group easternUS GE-Dos and IT-Tre for the group Europe Each of thesample sites retained (ie the representative sample sites) wasincluded alternatively in the sample set considered for ABC

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

991Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 2: Deciphering the Routes of invasion of Drosophila suzukii

Deciphering the Routes of invasion of Drosophila suzukii byMeans of ABC Random Forest

Antoine Fraimout1 Vincent Debat1 Simon Fellous2 Ruth A Hufbauer23 Julien Foucaud2 Pierre Pudlo4

Jean-Michel Marin5 Donald K Price6 Julien Cattel7 Xiao Chen8 Marindia Depra9

Pierre Francois Duyck10 Christelle Guedot11 Marc Kenis12 Masahito T Kimura13 Gregory Loeb14

Anne Loiseau2 Isabel Martinez-Sa~nudo15 Marta Pascual16 Maxi Polihronakis Richmond17

Peter Shearer18 Nadia Singh19 Koichiro Tamura20 Anne Xuereb2 Jinping Zhang21 and Arnaud Estoup2

1Institut de Systematique Evolution Biodiversite ISYEB - UMR 7205 ndash CNRS MNHN UPMC EPHE Museum national drsquoHistoirenaturelle Sorbonne Universites Paris France2INRA Centre de Biologie et de Gestion des Populations (UMR INRA IRD Cirad Montpellier SupAgro) Montferrier-Sur-Lez France3Colorado State University Fort Collins CO4Centre de Mathematiques et Informatique Aix-Marseille Universite Marseille France5Institut Montpellierain Alexander Grothendieck Universite de Montpellier Montpellier France6Tropical Conservation Biology amp Environmental Science University of Hawaii at Hilo HI7Laboratoire de Biometrie et Biologie Evolutive UMR CNRS 5558 Universite Claude Bernard Lyon 1 Villeurbanne France8College of Plant Protection Yunnan Agricultural University Kunming Yunnan Province Peoplersquos Republic of China9Programa de Pos Graduac~ao em Genetica e Biologia Molecular Programa de Pos Graduac~ao em Biologia Animal Universidade Federaldo Rio Grande do Sul Porto Alegre Brazil10CIRAD UMR Peuplement Vegetaux et Bioagresseurs en Milieu Tropical Paris France11Department of Entomology University of Wisconsin Madison WI12CABI Delemont Switzerland13Graduate School of Environmental Earth Science Hokkaido Daigaku University Sapporo Hokkaido Prefecture Japan14Department of Entomology Cornell University Ithaca NY15Dipartimento di Agronomia Animali Alimenti Risorse Naturali e Ambiente Universita degli Studi di Padova Padova Italy16Departament de Genetica Universitat de Barcelona Barcelona Spain17Division of Biological Sciences University of California San Diego18Mid-Columbia Agricultural Research and Extension Center Oregon State University Hood River OR19Department of Genetics North Carolina State University Raleigh NC20Department of Biological Sciences Tokyo Metropolitan University Tokyo Japan21MoA-CABI Joint Laboratory for Bio-safety Chinese Academy of Agricultural Sciences BeiXiaGuan Haidian Qu China

Corresponding author E-mail arnaudestoupinrafr

Associate editor Rasmus Nielsen

Abstract

Deciphering invasion routes from molecular data is crucial to understanding biological invasions includingidentifying bottlenecks in population size and admixture among distinct populations Here we unravel theinvasion routes of the invasive pest Drosophila suzukii using a multi-locus microsatellite dataset (25 loci on 23worldwide sampling locations) To do this we use approximate Bayesian computation (ABC) which has improvedthe reconstruction of invasion routes but can be computationally expensive We use our study to illustrate the useof a new more efficient ABC method ABC random forest (ABC-RF) and compare it to a standard ABC method(ABC-LDA) We find that Japan emerges as the most probable source of the earliest recorded invasion into HawaiiSoutheast China and Hawaii together are the most probable sources of populations in western North Americawhich then in turn served as sources for those in eastern North America European populations are geneticallymore homogeneous than North American populations and their most probable source is northeast China withevidence of limited gene flow from the eastern US as well All introduced populations passed through bottlenecksand analyses reveal five distinct admixture events These findings can inform hypotheses concerning how thisspecies evolved between different and independent source and invasive populations Methodological comparisonsindicate that ABC-RF and ABC-LDA show concordant results if ABC-LDA is based on a large number of simulated

Article

The Author 2017 Published by Oxford University Press on behalf of the Society for Molecular Biology and EvolutionThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40) which permits unrestricted reuse distribution and reproduction in any medium provided the original work isproperly cited Open Access980 Mol Biol Evol 34(4)980ndash996 doi101093molbevmsx050 Advance Access publication February 25 2017Downloaded from httpsacademicoupcommbearticle-abstract3449802953204

by Universite De La Reunion useron 03 July 2018

datasets but that ABC-RF out-performs ABC-LDA when using a comparable and more manageable number ofsimulated datasets especially when analyzing complex introduction scenarios

Key words Drosophila suzukii invasion routes random forest approximate Bayesian computation populationgenetics

IntroductionBiological invasions are a component of global change andtheir impact on the communities and ecosystems they invadeis substantial (Simberloff 2013) Invasive populations evolverapidly via both neutral and selective evolutionary processesas they undergo dramatic range expansion demographicreshuffling and experience new selection regimes (Kellerand Taylor 2010 Dlugosch et al 2015) While extensive liter-ature has laid the foundations of an evolutionary frameworkof biological invasions (Sakai et al 2001 Lee 2002 Facon et al2006 Dlugosch et al 2015 Estoup et al 2016) whether evo-lutionary processes are drivers of invasion success or out-comes of introduction processes remains unclear

Two of the evolutionary processes hypothesized to play arole in invasions are demographic bottlenecks and geneticadmixture (Ellstrand and Shierenbeck 2000 Dlugosch andParker 2008 Rius and Darling 2014) Demographic bottle-necks which correspond to transitory reductions in popula-tion size associated with invasions can reduce geneticvariation and thus may constrain invasion success(Edmonds et al 2004 Dlugosch and Parker 2008 Peischland Excoffier 2015) A diverse array of mechanisms allowsbottlenecked populations to overcome the deleterious con-sequences of low genetic variation and adapt to their novelenvironments [reviewed in Estoup et al (2016)] Genetic ad-mixture occurs when multiple introduction events derivefrom genetically differentiated native or invasive populationsAdmixure can increase genetic diversity in introduced popu-lations potentially enhancing invasion success by increasinggenetic variation on which selection can act (Kolbe et al 2004Lavergne and Molofsky 2007) or via producing entirely novelgenotypes that may facilitate colonization of novel habitats(Dlugosch and Parker 2008 Rius and Darling 2014) It remainsunclear however how often genetic admixture occurs in in-vasions and whether it acts as a true driver of invasion success(Uller and Leimu 2011 Rius and Darling 2014)

To evaluate hypotheses regarding bottlenecks and admix-ture as well as adaptation or other processes that may occurduring and after introductions we must first identify theoriginal native or invasive source(s) of the introductions Bycomparing introduced to likely source populations one caninfer whether populations diverged during invasion and thenstart to distinguish among evolutionary processes that driveobserved differences (Dlugosch and Parker 2008 Keller andTaylor 2010) Typically accurate historical data on introduc-tions pathways are limited (Estoup and Guillemaud 2010)Traditional population genetic approaches can describe thegenetic structure and relationships among sampled nativeand invasive populations (Boissin et al 2012) but determin-ing origins of introduced species from population genetic

data is challenging This challenge prompted the develop-ment of statistical methods such as approximate Bayesiancomputation (ABC Beaumont et al 2002) In ABC analysisdifferent introduction scenarios that describe possible intro-duction pathways are compared quantitatively taking amodel-based Bayesian approach Datasets matching differentscenarios are generated by simulation and by comparingthese simulated datasets to the observed dataset approxi-mate posterior probabilities of the scenarios are estimated(Bertorelle et al 2010 Beaumont 2010) A crucial step in de-termining the most probable introduction pathway is to for-mally describe a finite set of possible introduction scenarios asmodels (Estoup and Guillemaud 2010) These introductionpathways should be based on temporalhistorical informa-tion about invasions where feasible and they can be realisti-cally complex For example rather than a species movingdirectly from the native to invasive range it might first passthrough another region experiencing bottlenecks andor ad-mixture along the way ABC has been applied successfully to awide range of case studies from pest invasion (eg Harmoniaaxiridis Lombaert et al 2014) to anthropological research(eg Homo sapiens Verdu et al 2009) and is now widelyused by the population genetics community However anddespite the many advantages of ABC it relies on massivesimulations that can render it computationally costlyWhen multiple complex introduction scenarios are consid-ered the time and computational resources required tochoose among them can become prohibitive (Lombaertet al 2014) To overcome this constraint a new algorithmcalled ABC random forest (ABC-RF) has been developed(Pudlo et al 2016) Simulations show that ABC-RF discrimi-nates among scenarios more efficiently than traditional ABCmethods (Pudlo et al 2016) and thus it appears particularlysuitable for distinguishing among complex invasion pathways(see section below New approaches)

Here we evaluate the origins of invasive populations ofspotted-wing Drosophilla suzukii (Matsumura 1931) includ-ing where and whether they passed through bottlenecks inpopulations size or experienced admixture between geneti-cally distinct groups We use this invasion to illustrate the useof ABC-RF and as a case study to compare it to a morestandard ABC method Drosophila suzukii is a representativeof the melanogaster group and is historically distributed inSoutheast Asia covering a large portion of China JapanThailand and neighboring countries (Asplen et al 2015)Unlike most drosophilid species D suzukii females display alarge serrated ovipositor (Atallah et al 2014) allowing them tolay eggs in ripening fruits and they are thus a major threat tothe agricultural production of stone fruits and berries (Leeet al 2011) The presence of the species outside of its native

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

981Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

range was first recorded in the Hawaiian archipelago in theearly 1980rsquos (Kaneshiro 1983) but no further spread was ob-served until the late 2000rsquos when D suzukii was almost syn-chronously recorded in the southwest of the USA andsouthern Europe (Hauser 2011 Calabria et al 2012) By2015 the species had spread throughout most of theNorth-American and European continents and reached thesouthern hemisphere in Brazil (Depra et al 2014) Drosophilasuzukii represents a particularly appealing biological model forgaining further insights into the evolutionary factors associ-ated with invasion success As a closely related species of themodel insect Drosophila melanogaster its short generationtime and viability in laboratory conditions facilitate experi-mental approaches to study evolutionary events

ABC-based analyses of DNA sequences data obtained forsix X-linked gene fragments suggest that North American andEuropean populations represent separate invasion events(Adrion et al 2014) The sources of these invasions and po-tential admixture among different regions remain unclearThe wide native and introduced distribution of D suzukiiimply a potentially large number of possible sources andhence a large number of possible introduction scenarios Byfirst grouping sample size into genetic clusters the number ofpopulations treated independently can be reduced some-what (Lombaert et al 2014) but not fully Additionally thetemporal proximity of invasions in the US and Europe makesit difficult to a priori propose certain introduction scenariosover others Together these features of D suzukiirsquos invasionmean that a large number of scenarios will need to be com-pared dramatically increasing computational requirementsfor traditional ABC methods and hence making ABC-RFmethods particularly appealing

New ApproachesThe formal description of a finite set of possible introductionscenarios as models lays the foundation for ABC analyses andis outlined in Estoup and Guillemaud (2010) Choosingamong the formulated models is the central statistical prob-lem to be overcome when reconstructing routes of invasionfrom molecular data Both theoretical arguments and simu-lation experiments indicate that approximate posterior prob-abilities estimated from ABC analyses for the modeledintroduction scenarios can be inaccurate even though themodels being compared can still be ranked appropriately us-ing numerical approximation (Robert et al 2011) To over-come this problem Pudlo et al (2016) developed a novelapproach based on a machine learning tool named ldquorandomforestsrdquo (RF Breiman 2001) which selects among the complexintroduction models covered by ABC algorithms This ap-proach enables efficient discrimination among models andestimation of posterior probability of the best model whilebeing computationally less intensive

We invite readers to consult Pudlo et al (2016) to ac-cess to detailed statistical descriptions and testing of theABC random forest (ABC-RF) method Briefly randomforest (RF) is an algorithm that learns from a databasehow to predict a variable called the output from a possi-bly large set of covariates In our context the database is

the reference table which includes a given number ofdatasets that have been simulated for different scenariosusing parameter values drawn from prior distributionseach dataset being summarized with a pool of statistics(ie the covariates) RF aggregates the predictions of acollection of classification or regression trees (dependingwhether the output is categorical here the scenario iden-tity or quantitative here the posterior probability of thebest scenario) Each tree is built using the informationprovided by a bootstrap sample of the database and man-ages to capture one part of the dependency between theoutput and the covariates Based on these trees which areseparately poor to predict the output an ensemble learn-ing technique such as RF aggregates their predictions toincrease predictive performances to a high level of accu-racy in favorable contexts (Breiman 2001) RF is currentlyconsidered as one of the major state-of-art algorithm forclassification or regression

In ABC random forest (ABC-RF) Pudlo et al (2016) makestwo important advances regarding the use of summary sta-tistics and the identification of the most probable model Forthe summary statistics given a pool of different metrics avail-able ABC-RF extracts the maximum of information from theentire set of proposed statistics This avoids the arbitrarychoice of a subset of statistics which is often applied inABC analyses and also avoids what in the statistical sciencesis called ldquothe curse of dimensionalityrdquo [see Blum et al (2013)for a comparative review of dimension reduction methods inABC] With respect to identifying the most probable modelABC-RF uses a classification vote system rather than the pos-terior probabilities traditionally used in ABC analysis The firstoutcome of a ABC-RF analysis applied to a given target data-set is hence a classification vote for each competing modelwhich represents the number of times a model is selected in aforest of n classification trees The model with the highestnumber of classification votes corresponds to the model bestsuited to the target dataset among the set of competingmodels As a byproduct this step also provides a measureof the classification error called the prior error rate The priorerror rate is calculated as the probability of choosing a wrongmodel when drawing model index and parameter values intopriors The second outcome of ABC-RF is an estimation of theposterior probability of the best model that has been selectedusing a secondary random forest that regresses the modelselection error of the first-step random forest over the avail-able summary statistics

Pudlo et al (2016) shows that as compared to previousABC methods ABC-RF offers at least four advantages (i) itsignificantly reduces the model classification error as mea-sured by the prior error rate (ii) it is robust to the numberand choice of summary statistics as RF can handle manysuperfluous andor strongly correlated statistics with no im-pact on the performance of the method [see Blum et al(2013) for alternative methods of dimension reduction] (iii)the computing effort is considerably reduced as RF requires amuch smaller reference table compared with alternativemethods (ie a few thousands of simulated datasets versushundreds of thousands to millions of simulations per

Fraimout et al doi101093molbevmsx050 MBE

982Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

compared model) and (iv) it provides more reliable estima-tion of posterior probability of the selected model (ie themodel that best fit the observed dataset)

To the best of our knowledge the present study is the firstto use the ABC-RF method for an ensemble of model choiceanalyses with different levels of complexity (ie various num-ber of compared models including various and sometimeslarge number of parameters sampled populations and hencenumber of summary statistics) on real multi-locus microsat-ellite datasets Specifically we analyze molecular data at 25microsatellite loci from 23 D suzukii sample sites locatedacross most of its native and introduced range We con-ducted our population genetics study in five steps (1) Wedefined focal genetic groups and used this information toformalize 11 sets of competing introduction scenarios Wethen choose among them in a sequential ABC-RF analysis (2)We compared the performance of ABC-RF to a more stan-dard ABC method for a subset of analyses [ABC-LDA Estoupet al (2012) see ldquoMaterials and Methodsrdquo section] Wechoose ABC-LDA in our comparative study because it is con-sidered to be one of the most efficient standard ABC methodsto discriminate among models (Pudlo et al 2016) and it iswidely used in population genetics studies (Lombaert et al2014 and reference therein) (3) We refined the inferredworldwide invasion scenario by assessing the most likely or-igins of the primary introductions from the native area (4)We estimated the posterior distributions under the final in-vasion scenario of demographic parameters associated withbottleneck and genetic admixture events (5) Finally we per-formed model-posterior checking analyses to insure that thefinal worldwide invasion scenario displayed a reasonablematch to the observed dataset

Results

Origins of Invasive Genetic GroupsPrior to conducting any type of ABC analysis focal geneticgroups must be defined by characterizing genetic diversityand structure within and between all genotyped sample sitesusing traditional statistics and clustering methods This isdescribed in details in supplementary appendix S1Supplementary Material online We defined seven main ge-netic groups from our initial set of 23 sample sites (supplementary table S1 Supplementary Material online) Asia (sam-ple sites CN-Lan CN-Lia CN-Nin CN-Shi JP-Tok and JP-Sap)Hawaii (sample site US-Haw) western US (sample sites US-Wat US-Sok and US-SD) eastern US (sample sites US-ColUS-NC US-Wis and US-Gen) Europe (sample sites GE-DosFR-Par FR-Bor FR-Mon SW-Del SP-Bar and IT-Tre) Brazil(BR-PA) and La Reunion (FR-Reu)

This genetic grouping along with historical and geograph-ical information (supplementary table S1 SupplementaryMaterial online) allowed us to formalize 11 nested sets ofcompeting invasion scenarios that we analyzed sequentiallyusing ABC model choice methodologies The date thatD suzukii was first recorded in a location was used to helpformulate the scenarios For example D suzukii was first re-corded in the continental US from California (US-Wat) in

2008 and was not recorded in the central state of Colorado(US-Col) until 2012 US-Wat could therefore serve as a sourcefor US-Col but not vice versa We present the main analysesin table 1 In supplementary table S2 Supplementary Materialonline a more detailed description of each competing sce-narios considered for each analysis is provided The 11 se-quential ABC analyses permit step-by-step reconstructionof the introduction history of D suzukii worldwide Resultsfor each analysis based on prior set 1 (which used boundeduniform distributions of parameters see ldquoMaterials andMethodsrdquo section) and a single set of sample sites represen-tative of the genetic groups defined above are given in table 2For all 11 analyses similar ABC-RF results were obtained usingthe prior set 2 (which used more peaked distributions supplementary table S3 Supplementary Material online) andconsidering various sets of representative sample sites (supplementary table S4 Supplementary Material online)

By choosing among different competing scenarios eachanalysis identifies the most probable source(s) for a givenintroduced population The results come in the form of dif-fering levels of support for the competing introduction mod-els It is thus worth stressing here that the identification of themost probable source population X (from among a finite setof possible source populations) does not necessarily meanthat the introduced population Y originated from exactlylocation X but rather that the most probable origin of thefounders of the invasive population sampled at site Y is apopulation genetically similar to the source population sam-pled at site X However for sake of concision we will usehereafter the simplified terminology that the most probableorigin of Y is X

After the early invasion of Hawaii (1980) the first recordedintroduction was to the genetically heterogeneous westernUS and thus that is where our main analyses start (analyses1andash1d table 2) We considered the three western US samplesites in separate analyses and we found in each case that thebest scenario included genetic admixture between Asia andHawaii as sources (table 2 analysis 1andash1c mean posteriorprobability of the best scenario Pfrac14 0999) We subsequentlytested if this common admixture pattern resulted from asingle or multiple introduction events from Asia andorHawaii including the possibility of local colonization eventsamong western US sites (analysis 1d) The best invasion sce-nario for the western US included an initial admixture eventwith Asia and Hawaii as probable sources in northernCalifornia (site US-Wat admixture event A1 in fig 1) followedby (i) a local colonization southward into San Diego (US-SD)and northward into Oregon state (US-Sok) and (ii) a second-ary introduction event from Hawaii to Oregon (US-Sok) (A2event in fig 1 Pfrac14 0690) This result is supported by variousgenetic clustering results (fig 1 supplementary figs S1 and S2Supplementary Material online) as well as raw F-statisticsvalues which indicate high differentiation level between US-Sok and other continental US sample sites and a substantiallylower FST between US-Sok and Hawaii (supplementary tableS5 Supplementary Material online) Once the invasion historywas resolved for the western US group we inferred the mostprobable source(s) of the invasive eastern US and European

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

983Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

genetic groups using two independent analyses (analyses 2aand 2b table 1 and supplementary table S2 SupplementaryMaterial online) We found evidence for a single origin for theeastern US group corresponding to an intra-continentalspread from the San Diego area (Pfrac14 0779 table 2)

The most probable origin for Europe was an independentintroduction from the Asian native range (Pfrac14 0716 table 2)The European and North American invasions occurred nearlysimultaneously and thus we also explored gene flow betweenthe two regions Europe was never selected as a source for theeastern US (analysis 3a Pfrac14 0744) no matter which sets ofsample sites were considered In contrast we found someevidence of asymmetrical gene flow from the eastern USinto at least some European locations When consideringthe northern European sample site from Germany (GE-Dos) as a target we found that the best scenario includedan admixed origin with Asia and the eastern US (supplementary table S4 Supplementary Material online analysis 3bPfrac14 0488 and Pfrac14 0501 for prior set 1 and 2 respectively)In contrast the best scenario did not include such admixtureevent when considering the southern European sample sitefrom Italy (IT-Tre) as a target (table 2 analysis 3b Pfrac14 0510)We therefore replicated analysis 3b on all possible combina-tions of European and eastern US sample sites to assess therobustness of this result and evaluate the possibility of a geo-graphic pattern for admixture events For each of theEuropean sample sites figure 2 summarizes which replicateanalyses selected a scenario including an admixture betweeneastern US and Asia or a scenario of a single introductionfrom Asia (and see supplementary fig S3 SupplementaryMaterial online for results using an alternative representativesample site for the native range) Results tended to indicate

an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north-south gradient More specifically eastern US alleles were pre-sent in northern Europe and absent or at least not detectedby our model choice method in southern Europe We thenfurther tested whether the observed admixture in the northcorresponded to the mixing of eastern US genes with south-ern European genes originating from the initial introductionevent from Asia versus the mixing of eastern US genes withAsian genes introduced through a separate secondary intro-duction event from Asia in northern Europe (analysis 4) Thebest scenario was clearly that of an admixture between east-ern US genes and southern European genes which them-selves had a probable origin as part of the initialintroduction event from Asia (Pfrac14 0998 A3 event in fig 1)

Finally we found that the sampled population from Braziloriginated from North America with genetic admixture be-tween D suzukii individuals from the southwest and easternregions of the US (analysis 5a Pfrac14 0631 A4 event in fig 1)Regarding the most recent invasive population from LaReunion we found that this island population originatedfrom Europe with admixture between individuals fromnorthern and southern Europe (analysis 5b Pfrac14 0500 A5event in fig 1)

Comparison of Model Choice Analyses Using ABC-RFversus ABC-LDAFor all 11 analyses ABC-LDA performed well when using largereference tables (ie 500000 simulations per scenario table 2and supplementary table S3 Supplementary Material online)Indeed the prior error rates for ABC-LDA with large referencetables were smaller than the prior error rates for ABC-RF with

Table 1 Formulation of the Model Choice Analyses that were Carried Out Successively to Reconstruct Invasion Routes of D suzukii Using ABC

ModelChoiceAnalysis

Number ofComparedScenarios

Tackled Question Potential source genetic group Focal populations

1a 3 What are the origins of western US populations Asia Hawaii US-Wat1b US-Sok1c US-SD1d 7 What are the relations among western US popu-

lations and their extra-continental sourcesAsia Hawaii US-Wat US-Sok

US-SDUS-Wat thorn US-Sokthorn US-SD

2a 6 What are the origins of eastern US populations Asia Hawaii western US eastern US2b 6 What are the origins of European populations Asia Hawaii western US Europe3a 10 Is there asymmetrical gene flow from Europe to

eastern USAsia Hawaii western US Europe eastern US

3b 10 Is there asymmetrical gene flow from eastern USto Europe

Asia Hawaii western US eastern US Europe

4 2 Does the admixture with eastern US genes innorthern Europe result from a secondary Asianintroduction

Asia Hawaii western US eastern USsouthern Europe

northern Europe

5a 21 What are the origins of the Brazilian population Asia Hawaii western US eastern USsouthern Europe northern Europe

Brazil

5b 21 What are the origins of La Reunion population Asia Hawaii western US eastern USsouthern Europe northern Europe

La Reunion

NOTEmdash Each numbered analysis is a comparison of a certain number of scenarios by ABC model choice We summarize each analysis by stating the question that it addressed Adetailed verbal description of each compared scenario is given in supplementary table S2 Supplementary Material online The 11 analyses are nested in the sense that eachsubsequent analysis use the result obtained from the previous one For example in Analysis 2a ldquoWhat are the origins of eastern US populationsrdquo capitalizes on the historyinferred for western US populations from analysis 1d ldquoPotential source genetic grouprdquo indicates all the potential source populations considered in the analyses for which onewants to identify the origin of the focal population (ie the target)

Fraimout et al doi101093molbevmsx050 MBE

984Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Tab

le2

Res

ult

so

fM

od

elC

ho

ice

An

alys

esU

sin

gA

BC

-RF

and

AB

C-L

DA

Pri

or

erro

rra

teP

ost

erio

rP

rob

abili

tyo

fth

eb

est

Mo

del

Ori

gin

of

the

foca

lp

op

ula

tio

nu

sin

gei

ther

AB

C-R

Fo

rA

BC

-LD

A(i

e

bes

tm

od

el)

An

alys

isT

ota

lNu

mb

ero

fSc

enar

ios

Nu

mb

ero

fSu

mS

tats

A

BC

-RF

(sd

)A

BC

-LD

A(l

arge

refe

ren

ceta

ble

)

AB

C-L

DA

(sm

all

refe

ren

ceta

ble

)

AB

C-R

F(s

d)

AB

C-L

DA

(lar

gere

fere

nce

tab

le)

[CI]

1a3

390

100

(60

002)

007

50

085

100

0(6

000

1)1

000

[10

001

000

]A

siathorn

Haw

aii

1b3

009

4(6

000

1)0

070

009

10

989

(60

005)

099

9[0

999

09

99]

1c3

009

6(6

000

1)0

079

008

70

998

(60

002)

099

9[0

999

09

99]

1d7

130

032

7(6

000

1)0

274

036

80

690

(60

018)

074

1[0

714

07

67]

Asi

athorn

Haw

aii

2a6

130

023

2(6

000

1)0

203

024

10

779

(60

019)

078

8[0

764

08

13]

Wes

tern

US

2b6

130

023

2(6

000

1)0

179

039

30

716

(60

013)

060

4[0

592

06

16]

Asi

a

3a10

204

032

8(6

000

1)0

265

036

40

744

(60

020)

084

4[0

820

08

68]

Wes

tern

US

3b10

204

040

8(6

000

1)0

358

042

30

510

(60

014)

040

9[0

391

04

26]

Asi

a

42

301

012

1(6

000

1)0

111

021

21

000

(60

009)

099

9[0

999

09

99]

Sou

ther

nEu

rop

ethorn

east

ern

US

5a21

424

029

4(6

000

1)0

230

042

30

631

(60

034)

081

2[0

788

08

37]

Wes

tern

USthorn

east

ern

US

5b21

424

029

8(6

000

1)0

242

042

80

500

(60

027)

029

0[0

256

03

23]

No

rth

ern

Euro

pethorn

sou

ther

nEu

rop

e

NO

TEmdash

All

mo

del

cho

ice

anal

yses

wer

eca

rrie

do

ut

usi

ng

the

pri

or

set

1(s

up

ple

men

tary

tab

leS8

Su

pp

lem

enta

ryM

ater

ialo

nlin

e)an

da

sin

gle

set

ofs

amp

lesi

tes

rep

rese

nta

tive

oft

he

pre

-defi

ned

gen

etic

gro

up

sSa

mp

lesi

teJP

-To

kfo

rth

en

ativ

eA

sian

gro

up

US-

Haw

fort

he

Haw

aiia

ngr

ou

pU

S-W

atU

S-So

kan

dU

S-SD

fort

he

wes

tern

US

gro

up

US-

NC

fort

he

east

ern

US

gro

up

IT

-Tre

fort

he

sou

ther

nEu

rop

egr

ou

pG

E-D

os

fort

he

no

rth

ern

Euro

pe

gro

up

BR

-PA

fort

he

Bra

zilg

rou

pa

nd

FR-

Reu

for

the

LaR

eun

ion

gro

up

(fig

1)D

atas

ets

wer

esu

mm

ariz

edu

sin

gth

ew

ho

lese

to

fsu

mm

ary

stat

isti

csp

rop

ose

db

yD

IYA

BC

(Co

rnu

etet

al2

014)

Th

eto

taln

um

ber

ofs

um

mar

yst

atis

tics

(Nu

mb

ero

fSu

mS

tats

)as

wel

las

the

tota

lnu

mb

ero

fco

mp

ared

scen

ario

s(T

ota

lnu

mb

ero

fsce

nar

ios)

are

ind

icat

edfo

reac

han

alys

isP

rio

rerr

orr

ates

and

po

ster

iorp

rob

abili

ties

oft

he

bes

tmo

del

cho

sen

usi

ng

AB

C-R

Fw

ere

aver

aged

ove

r10

rep

licat

ean

alys

esA

BC

-RF

and

AB

C-L

DA

trea

tmen

tsyi

eld

edth

esa

me

bes

tm

od

elch

oic

efo

ral

lan

alys

esan

dis

den

ote

din

the

colu

mn

ldquoOri

gin

oft

he

foca

lpo

pu

lati

on

usi

ng

eith

erA

BC

-RF

or

AB

C-L

DA

(ie

b

est

mo

del

)rdquoA

dm

ixtu

reb

etw

een

two

sou

rce

po

pu

lati

on

sar

ere

pre

sen

ted

by

aldquothorn

rdquosi

gnA

BC

-LD

Ap

ost

erio

rpro

bab

iliti

eso

fth

eb

est

mo

del

sw

ere

esti

mat

edu

sin

gldquol

arge

refe

ren

ceta

ble

srdquow

ith

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioA

BC

-LD

Ap

rio

rerr

orr

ates

wer

eco

mp

ute

du

sin

gre

fere

nce

tab

les

oft

wo

diff

eren

tsi

zes

ldquolar

gere

fere

nce

tab

lerdquo(

ie

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

io)

and

ldquosm

allr

efer

ence

tab

lerdquo

(ie

10

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioas

for

AB

C-R

Fan

alys

es)

SD

sta

nd

sfo

rst

and

ard

dev

iati

on

ove

r10

rep

licat

ean

alys

esan

dC

Ifo

r95

co

nfi

den

cein

terv

alco

mp

ute

dfo

llow

ing

Co

rnu

etet

al(

2008

)

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

985Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

small reference tables (paired t-test t10frac1427 Pfrac14 002)However ABC-RF performed better than ABC-LDA whenusing small reference tables (only 10000 simulations per sce-nario) having lower prior error rates (paired t-test t10frac14116Plt 00001 table 2 and supplementary table S3Supplementary Material online) The difference in prior errorrate was particularly marked for the complex analyses iethose in which many introduction scenarios were comparedand many summary statistics were computed For instancefor the analyses 5a and 5b (21 compared scenarios and 424summary statistics computed) prior error rates were 0217and 0221 for ABC-RF versus 0417 and 0425 for ABC-LDA

Regarding computational effort we found that for a givenobserved dataset an ABC-RF treatment to select the bestscenario and compute its posterior probability required a

ca 50 times shorter computational duration than when pro-cessing a ABC-LDA treatment with a traditional referencetable of large size (ie 500000 simulated datasets per sce-nario) Moreover estimation of prior error rates with ABC-RF took only a few additional minutes of computation timefor 10000 pseudo-observed datasets whereas with ABC-LDAit lasted several hours to several days (depending on theanalysis processed) for 500 pseudo-observed datasets

The same invasion scenario had the highest probability forall 11 analyses carried out using ABC-RF and ABC-LDA withlarge number of simulated datasets (table 2 and supplementary table S3 Supplementary Material online) Posterior prob-abilities of the best scenario estimated using ABC-RF were notsystematically higher (or lower) than those using ABC-LDAwith large number of simulated datasets In particular there

FIG 1 Worldwide invasion scenario of D suzukii inferred from microsatellite data and date of first observation Map and schematic showingsample sites and the invasion routes taken by D suzukii as reconstructed by ABC-RF (Pudlo et al 2016) on a total of 685 individuals from 23geographic locations genotyped at 25 microsatellite loci (see results and methods for details) The native range is in dark grey and the invasiverange is in light-gray (cf delimitation from fig 1 in Asplen et al 2015) The year in which D suzukii was first observed at each sample site is indicatedin italics The 23 geographical locations that were sampled are represented by circles (native range) and squares diamonds and triangles(introduced range) Squares indicate populations that experienced weak bottlenecks (ie median value of bottleneck severity lt 012 seemain text and table 3) diamonds indicate moderate bottlenecks (012 lt bottleneck severity lt 022) and triangles indicate strong bottlenecks(ie bottleneck severitygt 03) The colors of the symbol for the sample sites and arrows between them correspond to the different genetic groupsobtained using the clustering method BAPS (supplementary appendix S1 Supplementary Material online) The arrows indicate the most probableinvasion pathways A1ndashA5 indicate five separate admixture events between different sources O1ndashO3 indicate the most probable sources withinthe native range for the primary introduction events A1frac14 Hawaiithorn southeast China A2frac14Watsonville (western US)thorn Hawaii A3frac14 southernEurope thorn eastern US A4 frac14 western US thorn eastern US A5 frac14 southern Europe thorn northern Europe O1 frac14 Japan O2 frac14 southeast China O3 frac14northeast China

Fraimout et al doi101093molbevmsx050 MBE

986Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

was no clear trend of overestimation of posterior probabilitywhen using ABC-LDA with large number of simulated data-sets versus ABC-RF a potential bias suggested by Pudlo et al(2016) On the other hand we found evidence of instability inthe estimation of posterior probabilities and hence modelselection when using ABC-LDA with a small numbers of sim-ulated datasets in the reference table (ie 10000 simulationsper scenario as for ABC-RF) In this case the scenario with thehighest probability was different from that selected using ei-ther ABC-LDA with the large reference table or ABC-RF infour and two of the 11 analyses when using the prior sets 1and 2 respectively (results not shown)

Refining the Origins of the Primary IntroductionEvents from the Native AreaAdditional ABC-RF analyses were run to determine whichgeographical area in the native range was the most likelyorigin of each of the three primary introductions (ie inHawaii western US and Europe) The most probable originof the Hawaiian invasive population was Japan with a mean

posterior probability of 0959 (origin O1 in fig 1) ForWatsonville (western US sample site US-Wat) the mostprobable source of the primary introduction from Asia wassoutheast China (origin O2 in fig 1 Pfrac14 0860) Finally forEurope the most probable source of the primary introduc-tion from Asia was northeast China (origin O3 in fig 1Pfrac14 0855) Similar posterior probabilities were obtained usingthe prior set 1 (values presented above) and the prior set 2(results not shown) For inferences regarding Europe similarposterior probabilities were also obtained using various sets ofsample sites from Europe and the eastern US (results notshown)

Estimation of Parameter Distributions for AdmixtureRate and Bottleneck SeverityWe used the most probable worldwide invasion scenario (fig1) to estimate the posterior distributions of parameters re-lated to bottleneck severity associated to the foundation ofnew populations and genetic admixture rates between differ-entiated sources (ie A1ndashA5 events in fig 1)

FIG 2 Admixed or non-admixed origin of D suzukii in Europe inferred for each sample sites Potential source populations include (among others)the Asian native range (represented by the Japanese sample site JP-Tok) and the invasive genetic group from eastern US represented by one of thefour invasive sample sites collected in this area (fig 1) Four replicate independent ABC-RF treatments corresponding to the analysis 3b (table 2)were hence carried out for each targeted European sample site using one of the four eastern US sample site The treatments labeled 1 2 3 and 4 inthe pies of the figure have been carried out with the sample sites US-NC US-Wis US-Gen and US-Col respectively A pie quarter in blue indicatesthat the best scenario corresponds to a single introduction event from Asia A pie quarter in red indicates that the best scenario corresponds to anadmixture event between Asia and eastern US Dates in italic correspond to the dates of first record of the European sample sites

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

987Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

The posterior distributions parameters relating to bottle-necks and admixture substantially differed from the priorsindicating that genetic data were informative for such param-eters (tables 3 and 4 and see supplementary tables S6 and S7Supplementary Material online for results using an alternativeset of representative sample sites) Regarding bottleneck se-verity we found that bottlenecks tended to be more severefor populations founded by individuals originating from asource located on a different continent (extra-continentalorigin) than for populations founded by individuals originat-ing from a source located on the same continent (intra-con-tinental origin table 3 and fig 1) The bottleneck severity forpopulations founded by individuals corresponding to a com-bination of the two types of sources (extra- and intra-continental origins) was either weak or moderate The stron-gest bottleneck severity was found for the first invasive

population in Hawaii which showed the lowest level of ge-netic variation among all invasive populations Bottleneckseverity values were globally slightly stronger in Europeanthan in US populations suggesting a smaller number offounding individuals andor a slower demographic recoveryin European populations

Regarding admixture (table 4) we found that (i) for thefirst recorded invasive population in western US (ieWatsonville sample site US-Wat) the genetic contributionfrom China was substantially larger than that from Hawaii(median value of 0759) (ii) for the western US populationUS-Sok the secondary genetic contribution from Hawaii wassmall compared to that from Watsonville (median value of0237) and (iii) the sample sites from northern Europe (egGE-Dos in Germany) contained a rather low proportion ofgenes originating from the eastern US (median value of

Table 3 Bottleneck Severity in Invasive Populations of D suzukii

Prior 1 Prior 2Introduction Type Sample Site Mean Median Mode q5 q95 Mean Median Mode q5 q95

Extra continental US-Haw 0517 0500 0495 0326 0755 0490 0484 0477 0382 0615US-Wat 0181 0138 0114 0041 0448 0143 0127 0118 0059 0275IT-Tre 0268 0179 0171 0059 0837 0236 0189 0172 0101 0553BR-PA 0158 0126 0104 0020 0407 0208 0193 0172 0067 0399FR-Reu 0259 0215 0186 0051 0626 0245 0214 0190 0069 0534

Intra-continental US-SD 0135 0117 0106 0039 0283 0117 0104 0091 0053 0224US-NC 0089 0067 0061 0015 0230 0111 0098 0090 0046 0218

Extra thorn intracontinental

US-Sok 0116 0099 0088 0027 0250 0119 0106 0099 0052 0229GE-Dos 0189 0177 0155 0061 0356 0205 0189 0171 0088 0372

Prior values 0628 0199 NA 0021 1904 0517 0500 NA 0326 0754

NOTEmdash Extra-continental introductions correspond to a long distance introduction from a source located apart from the continent of the focal population intra-continentalintroduction corresponds to an introduction event from a source located on the same continent than the focal population and Extrathorn Intra continental introductioncorresponds to a combination of the two types of sources Mean median and mode estimates as well as bounds of 90 credibility intervals (q5 and q95) are indicated foreach bottleneck severity parameter We roughly classified the estimated bottleneck severity values into three classes (represented here by the three shades of gray) weak (iemedian value of bottleneck severitylt 012 in light gray) moderate (012lt bottleneck severitylt 022 in gray) and strong (ie bottleneck severitygt 03 in dark gray) The set ofsample sites used for the ABC estimations presented here include (i) for the native area Japan (JP-Tokthorn JP-Sap) South-East China (CN-Nin) and North-East China (CN-LanthornCN-Lia) and (ii) for the invaded range US-Wat US-Sok and US-SD for western US US-NC for eastern US IT-Tre for southern Europe GE-Dos for northern Europe BR-PAfor South-America (Brazil) and FR-Reu for La Reunion island Code names of the sample sites are the same as in fig 1 and supplementary table S1 Supplementary Materialonline in which bottleneck severity classes are also given for each sample site See supplementary table S6 Supplementary Material online for results on a different set ofrepresentative sample sites

Table 4 Posterior Distributions of Admixture Rates for the Five Admixture Events Inferred for the Final Worldwide Invasion Scenario Described infigure 1

Admixture Event Admixture Rate (gene fraction from pop x) Mean Median Mode q5 q95

Prior 1 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0759 0761 0774 0659 0854A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0237 0230 0227 0092 0400A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0286 0278 0266 0112 0481A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0467 0461 0442 0187 0754A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0388 0380 0426 0132 0670

Prior 2 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0761 0762 0766 0684 0833A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0224 0219 0199 0114 0350A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0312 0306 0284 0152 0487A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0422 0418 0429 0247 0613A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0370 0368 0356 0178 0569

NOTEmdash Admixture events are denoted as in figure 1 Each admixture rate parameter r points to the name of the admixed site and corresponds to the fraction of genesoriginating from the source site in parentheses (1 r genes originate from the other source site) Mean median and mode estimates as well as bounds of 90 credibilityintervals (q5 and q95) are indicated for each admixture parameter Estimations assuming the prior set 1 and the prior set 2 (supplementary table S8 Supplementary Materialonline) are provided The set of representative sample sites used for the ABC estimations presented here is the same than in table 3 Code names of the population sites are thesame as in fig 1 and supplementary table S1 Supplementary Material online See supplementary table S7 Supplementary Material online for results on a different set ofrepresentative sample sites

Fraimout et al doi101093molbevmsx050 MBE

988Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

0286) We found more balanced admixture rates for thepopulations from Brazil (BR-PA) and La Reunion (FR-Reu)but information on this parameter was less accurate in thesecases as indicated by large 90 credibility intervals

Model-Posterior CheckingLike any model-based methods ABC inferences do not revealthe ldquotruerdquo evolutionary history but allows choosing the bestamong a necessarily limited set of scenarios that have beencompared and to estimate posterior distributions of param-eters under this scenario How well the inferred scenario-posterior combination matches with the observed datasetremains to be evaluated using an ABC model-posterior check-ing analysis When applying such an analysis on the final in-vasion scenario detailed in figure 1 we found that whenconsidering the prior set 1 only 54 of the 1141 summarystatistics used as test quantities had low posterior predictiveP-values (ie 0002lt ppp-valueslt 5) Moreover none ofthose ppp-values values remained significant when correctingfor multiple comparisons (Benjamini and Hochberg 1995)Similar results were obtained when assuming the prior set 2and when considering different sets of representative samplesites in our analyses (results not shown) These findings showthat the final worldwide invasion scenario (fig 1 with associ-ated parameter posterior distributions) matches the observeddataset well In agreement with this the projections of thesimulated datasets on the principal component axes from thefinal model-posterior combination were well grouped andcentered on the target point corresponding to the observeddataset (supplementary fig S4 Supplementary Material on-line) Due to the modest size of the dataset (ie 25 microsat-ellite loci) however only situations of major inadequacy ofthe model-posterior combination to the observed dataset arelikely to be identified

DiscussionIn the present paper we decipher the routes taken byD suzukii in its invasion worldwide evaluating evidence ofbottlenecks and genetic admixture associated with differentintroductions and we quantify the efficiency of ABC-RF rel-ative to the more standard ABC-LDA method for choosingamong introduction scenarios

A Complex Worldwide Invasion HistoryPrior to this study and that of Adrion et al (2014) our un-derstanding of the worldwide introduction pathways ofD suzukii was based on historical and observational datawhich were incomplete and potentially misleading Our re-sults indicate three distinct introductions from the nativerangemdashto Hawaii the western side of North America andwestern Europe which accords well to findings of Adrionet al (2014) from a different set of markers (X-linked se-quence data) Both the western North American introduc-tions and at least some European populations show signs ofadmixture Hawaii and China both appear to have contrib-uted to the western North American introductions whichwas in turn the most probable source for the introductioninto eastern North America Similarly China and to a lesser

extent eastern North America both appear to have contrib-uted to the European introduction but mostly in northernEuropean areas with respect to the eastern North Americancontribution The almost simultaneous invasion of NorthAmerica and Europe from Asia could be due to increasedtrade between these areas facilitating transport of this speciesbetween regions Alternatively or in addition adaptation tohuman-altered habitats (in this case changes in agriculturalpractices) within the native range of the species could havepromoted invasion to similarly altered habitats worldwide(ie anthropogenicaly induced adaptation to invadeHufbauer et al 2012) However the two invaded continentswere most likely colonized by flies from distinct Chinese geo-graphic areas (fig 1) requiring that adaptation to agricultureoccurred concomitantly at several locations within the nativerange This is possible but not evolutionarily parsimoniousand further data would be required to test this explanationSubsequent introductions from the three primary founderlocations to other locations result in a complex history ofinvasion Both primary and secondary introductions showsigns of demographic bottlenecks and several invasive pop-ulations have multiples sources and thus experienced geneticadmixture between differentiated native or invasivepopulations

The genetic relationships between worldwide D suzukiipopulations define the genetic variation they harbor andmay continue to shape further spread of alleles among pop-ulations Gene flow among populations through continuousdispersal or more punctual admixture events permits thedissemination of allelic variants and new mutations withdramatic consequences for evolutionary trajectories(Lenormand 2002) We found evidence for asymmetric ge-netic admixture from North American towards (northern)European populations If this signal of admixture reflects re-current and on-going dispersal events rather than a singlepast secondary introduction event then variants that arisein North America would be more likely to spread to Europethan the reverse Discriminating between ongoing and pastdispersal events is tricky however especially for recent inva-sions (Benazzo et al 2015) If the occurrence of recurrent geneflow is confirmed then D suzukii populations from Europemay not lag behind American ones in evolutionary termsFrom an applied perspective if resistance to control practicesemerges in North America special efforts to stop the impor-tation of D suzukii to Europe would help prevent the evolu-tion of resistance in Europe and thus reduce damage to crops(Caprio and Tabashnik 1992)

Advances in Methods for Discriminating amongComplex ModelsTo the best of our knowledge the present study is the first touse the recently developed ABC-RF method (Pudlo et al2016) to test competing models characterized by differentlevels of complexity To compare ABC-RF to a more standardABC method we carried out a subset of model choice anal-yses using both ABC-RF and ABC-LDA (Estoup et al 2012)We ran ABC-LDA analyses in two ways with standard sizedlarge reference tables (500000 simulated datasets per

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

989Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

scenario) and with small reference tables (10000 simulateddatasets per scenario the same size as used for ABC-RF) Theperformance of ABC-LDA (in terms of prior error rate) inchoosing among introduction scenarios was slightly betterthan for ABC-RF when using large reference tablesFurthermore for ABC-LDA with large reference tables confi-dence in model choice (in terms of posterior probability) wasrelatively comparable to ABC-RF with small reference tablesThus when analyses are relatively simple or computationalresources are not limiting ABC-LDA can provide as robustinferences of invasion scenarios as ABC-RF

In contrast we found that ABC-RF out-performed ABC-LDA when using similarly small reference tables for both typesof analyses (10000 simulated datasets per scenario) ABC-LDA also was unstable choosing different scenarios in differ-ent replicate analyses when using small reference tablesThese empirical findings correspond well with those ofPudlo et al (2016) using simple models for which the trueposterior probabilities could be calculated Pudlo et al (2016)demonstrated that ABC-RF provided more reliable posteriorprobability of the best (true) scenario than standard ABCmethods when using the same (small) number of simulateddatasets in the reference table (see supplementary fig S3 inPudlo et al 2016) In our work the difference in performancebetween the two methods was most pronounced for com-plex analyses (eg when comparing more than 10 scenariosusing more than 100 summary statistics) Thus when analysesare not simple or computational resources are finite ABC-RFprovides more robust inferences of invasion scenarios thanABC-LDA

The lower computation cost of ABC-RF was also evidentFor a given observed dataset selecting the best supportedscenario and computing its posterior probability was ca 50times faster than when processing an ABC-LDA treatmentwith a traditional reference table of large size Moreover es-timation of prior error rates with ABC-RF took only a fewadditional minutes of computation time whereas it lastedseveral hours to several days with ABC-LDA The conse-quence of low computational costs is not simply in computertime Rather the efficiency of ABC-RF analyses allowed us torun replicate analyses on various sample sets even for themost complex D suzukii invasion scenarios This replication iscritical in evaluating the robustness of our statisticalinferences

We found that at least for some ABC-RF analyses theposterior probability of the best scenario and the global sta-tistical power to choose among alternative invasion scenarios(as measured by the prior error rate) were relatively low Forinstance in three of the 11 analyses (analyses 3b 5a and 5b)the best scenarios had relatively low probabilities (ie rangingfrom 0500 to 0630) and relatively high prior error rates (ieranging from 0300 to 0400) Although replicate analyses car-ried out on different sample sets and prior sets pointed to thesame best scenario for these three analyses such low proba-bility values and high prior error rates suggest a moderatelevel of confidence in the choice of model in these casesSimilarly we found that the scenario choice leading to theconclusion of the presence versus absence of admixed genes

from the eastern US in various European sample sites reliedon probability values that were sometimes as low as 0500Moreover in a minority of cases the best scenario was differ-ent depending on the sample site considered as representa-tive of the eastern US or the native area The observed patternof an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north tosouth gradient should hence be taken cautiously The anal-yses of additional European sample sites are needed to clarifythis issue The above inferential uncertainties also most likelyfind their sources in the relatively small number of markersgenotyped relative to the number complexity and similarityof some of the competing models Such results indicate thatthere is room for improving the robustness of our inferencesat least for a subset of our analyses

Bottlenecks and Genetic AdmixtureWe found strong evidence that bottlenecks and admixtureboth occurred during the course of the invasion of D suzukiiA first sign of bottlenecks is a reduction in neutral geneticdiversity We found that all invasive D suzukii populationswere characterized by significantly lower genetic variationthan native ones with a loss of 64 (US-Col) to 233(US-Haw) of heterozygosity and of 273 (US-Wat) to542 (US-Haw) of allelic diversity respectively (supplementary fig S5 Supplementary Material online) Adrion et al(2014) find a similar significant loss of diversity in their datasetfocused on gene sequences Dlugosch and Parker (2008) andUller and Leimu (2011) reviewed studies of neutral geneticdiversity in a large number of species of animals plants andfungi and compared nuclear molecular diversity within intro-duced and source populations Overall they found that a lossof variation was the most frequent feature in invasive popu-lations However reductions in genetic variation were on av-erage modest (eg average loss of 187 and 155 ofheterozygosity and allelic diversity respectively Dlugoschand Parker 2008) The reductions observed in our study ininvasive D suzukii populations were thus in the same rangefor heterozygosity but higher for allelic diversity (cf averageloss of 111 and 345 of heterozygosity and allelic diversityrespectively) A larger loss of allelic diversity than heterozy-gosity after a bottleneck is nevertheless expected by theory(Nei et al 1975)

The type of introduction pathway explains at least partlythe severity of the reduction in genetic diversity Bottleneckstended to be less severe for populations founded by individ-uals from the same continent than for populations foundedby individuals originating from a different continent A simpleexplanation for such pattern is that due to greater proximitypopulations originating from the same continent are likely tobe initially founded by a larger numbers of individuals andalso to experience recurrent gene flow Populations foundedby individuals from both the same and different continentsexperienced weak or moderate reductions in diversity sug-gesting that multiple introduction pathways restore diversity

In agreement with previous studies on invasions on simi-larly large geographical scales (eg Lombaert et al 2014 andreference therein) we found that admixture events were

Fraimout et al doi101093molbevmsx050 MBE

990Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

frequent in the worldwide invasion history of D suzukii Weidentified at least five admixture events for which sourceswere native andor invasive populations (ie A1ndashA5 eventsin fig 1) The genetic contributions of the sources were un-balanced except for the populations from Brazil and LaReunion but genetic information regarding admixture wasconsiderably less accurate in the latter cases which involvedweakly differentiated source populations

We have provided here the information necessary to eval-uate whether bottlenecks and admixture have influencedoutcomes in this invasion Quantitative genetics studies inthe lab focusing on the key source bottlenecked and ad-mixed populations could now examine the fitness and adap-tive potential of those groups (Lavergne and Molofsky 2007Facon et al 2008 Turgeon et al 2011)

Conclusions and PerspectivesUnderstanding invasion pathways provides key informationfor understanding the role of evolutionary processes in bio-logical invasions For example only does our research revealsources of invasive populations for further relevant compar-isons using quantitative genetics studies in the labAdditionally we found that the invasion of Europe by Dsuzukii was distinct from that of North America with limitedand asymmetrical gene flow between these two main invadedareas This situation provides the opportunity to evaluateevolutionary trajectories in replicate into two separate tem-perate climate regions similarly to research by Gilchrist et al(2001 2004) on the pace of clinal evolution in the invasivefruit fly Drosophila subobscura There are also distinct invasionpathways for areas with warmer climates (ie Hawaii LaReunion and Brazil) that will make interesting comparisons

Retracing invasion routes and making inferences aboutdemographic processes is only possible if there is adequatepolymorphism within populations and significant genetic dif-ferentiation among them As is evident here microsatellitedata can provide enough variation to reveal important path-ways We concur with Adrion et al (2014) however thatgenome-wide data (ie next generation sequencingapproaches) will be tremendously powerful in further dis-criminating among complex invasion scenarios (see alsoCristescu 2015 Estoup et al 2016) Because of the reducedcomputational resources demanded by ABC-RF this methodwill be particularly useful for analysis of massive single nucle-otide polymorphism datasets (Pudlo et al 2016) In additionto the selectively neutral demographic inference leading tothe reconstruction of routes of invasion population (andquantitative) next generation sequencing approaches arequite promising for studying the evolution of phenotypictraits in natural populations (Wray 2013 Gautier 2015)Specifically they can be used to better understand the geneticarchitecture of traits underlying invasion success (Bock et al2015) Drosophila suzukii is a good species for such researchgiven (i) its short generation time and viability in laboratoryconditions which facilitate experimental approaches to studyquantitative traits of interest (Asplen et al 2015) and (ii) theavailability of annotated genome assemblies for this species

(Chiu et al 2013 Ometto et al 2013) along with the hugeamount of genomic resources available in its close relativespecies D melanogaster (Groen and Whiteman 2016)

Materials and Methods

Sampling and GenotypingAdult D suzukii were sampled from the field at a total of 23localities (hereafter termed sample sites) distributed through-out most of the native and invasive range of the species(supplementary table S1 Supplementary Material onlineand fig 1) Samples were collected between 2013 and 2015using baited traps and sweep nets and stored in ethanolNative Asian samples consisted of a total of six sample sitesincluding four Chinese and two Japanese localities Samplesfrom the invasive range were collected in Hawaii (1 samplesite) Continental US (7 sites) Europe (7 sites) Brazil (1 site)and in the French island of La Reunion (1 site) For eachsample site 15ndash44 adult flies were genotyped at 25 of the28 microsatellite loci described in Fraimout et al (2015) re-sulting in a total of 685 genotyped individuals Three micro-satellite loci from Fraimout et al (2015) (ie loci DS31 DS42and DS45) were not included due to the presence of nullalleles at frequenciesgt 10 in some sample sites (resultsnot shown) DNA extraction and PCR amplification as wellas allele scoring were performed following the methodologydetailed in Fraimout et al (2015)

General Framework for Defining Focal PopulationGroups and Sets of Scenarios to Be ComparedUsing ABCOne of the many challenges of ABC analyses is to optimallydefine sets of sample sites to be processed chronologically aspotential sources and targets in a way that do not hamper thecomputational effort As stated in Estoup and Guillemaud(2010) and Lombaert et al (2014) the number and complex-ity of competing scenarios is proportional to the number ofgenetic groups to be accounted in ABC analyses FollowingLombaert et al (2014) we used the results obtained fromthree genetic clustering methods along with historical andgeographical information to define genetic groups that couldinclude several sample sites (supplementary appendix S1Supplementary Material online) We defined seven main ge-netic groups from our set of 23 sample sites Asia Hawaiiwestern US eastern US Europe Brazil and La Reunion Tomake the computations less intensive with respect to com-puter resources ABC treatments were carried out using onerepresentative sample site for each genetic group per replicateanalysis In other words when a genetic group included morethan two sample sites we considered the two most differen-tiated sample sites (ie with the highest FST values in supplementary table S3 Supplementary Material online) as repre-sentative of the group CN-Shi and JP-Tok for the group Asiaall three sample sites for the genetically heterogeneousWestern US group US-NC and US-Wis for the group easternUS GE-Dos and IT-Tre for the group Europe Each of thesample sites retained (ie the representative sample sites) wasincluded alternatively in the sample set considered for ABC

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

991Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 3: Deciphering the Routes of invasion of Drosophila suzukii

datasets but that ABC-RF out-performs ABC-LDA when using a comparable and more manageable number ofsimulated datasets especially when analyzing complex introduction scenarios

Key words Drosophila suzukii invasion routes random forest approximate Bayesian computation populationgenetics

IntroductionBiological invasions are a component of global change andtheir impact on the communities and ecosystems they invadeis substantial (Simberloff 2013) Invasive populations evolverapidly via both neutral and selective evolutionary processesas they undergo dramatic range expansion demographicreshuffling and experience new selection regimes (Kellerand Taylor 2010 Dlugosch et al 2015) While extensive liter-ature has laid the foundations of an evolutionary frameworkof biological invasions (Sakai et al 2001 Lee 2002 Facon et al2006 Dlugosch et al 2015 Estoup et al 2016) whether evo-lutionary processes are drivers of invasion success or out-comes of introduction processes remains unclear

Two of the evolutionary processes hypothesized to play arole in invasions are demographic bottlenecks and geneticadmixture (Ellstrand and Shierenbeck 2000 Dlugosch andParker 2008 Rius and Darling 2014) Demographic bottle-necks which correspond to transitory reductions in popula-tion size associated with invasions can reduce geneticvariation and thus may constrain invasion success(Edmonds et al 2004 Dlugosch and Parker 2008 Peischland Excoffier 2015) A diverse array of mechanisms allowsbottlenecked populations to overcome the deleterious con-sequences of low genetic variation and adapt to their novelenvironments [reviewed in Estoup et al (2016)] Genetic ad-mixture occurs when multiple introduction events derivefrom genetically differentiated native or invasive populationsAdmixure can increase genetic diversity in introduced popu-lations potentially enhancing invasion success by increasinggenetic variation on which selection can act (Kolbe et al 2004Lavergne and Molofsky 2007) or via producing entirely novelgenotypes that may facilitate colonization of novel habitats(Dlugosch and Parker 2008 Rius and Darling 2014) It remainsunclear however how often genetic admixture occurs in in-vasions and whether it acts as a true driver of invasion success(Uller and Leimu 2011 Rius and Darling 2014)

To evaluate hypotheses regarding bottlenecks and admix-ture as well as adaptation or other processes that may occurduring and after introductions we must first identify theoriginal native or invasive source(s) of the introductions Bycomparing introduced to likely source populations one caninfer whether populations diverged during invasion and thenstart to distinguish among evolutionary processes that driveobserved differences (Dlugosch and Parker 2008 Keller andTaylor 2010) Typically accurate historical data on introduc-tions pathways are limited (Estoup and Guillemaud 2010)Traditional population genetic approaches can describe thegenetic structure and relationships among sampled nativeand invasive populations (Boissin et al 2012) but determin-ing origins of introduced species from population genetic

data is challenging This challenge prompted the develop-ment of statistical methods such as approximate Bayesiancomputation (ABC Beaumont et al 2002) In ABC analysisdifferent introduction scenarios that describe possible intro-duction pathways are compared quantitatively taking amodel-based Bayesian approach Datasets matching differentscenarios are generated by simulation and by comparingthese simulated datasets to the observed dataset approxi-mate posterior probabilities of the scenarios are estimated(Bertorelle et al 2010 Beaumont 2010) A crucial step in de-termining the most probable introduction pathway is to for-mally describe a finite set of possible introduction scenarios asmodels (Estoup and Guillemaud 2010) These introductionpathways should be based on temporalhistorical informa-tion about invasions where feasible and they can be realisti-cally complex For example rather than a species movingdirectly from the native to invasive range it might first passthrough another region experiencing bottlenecks andor ad-mixture along the way ABC has been applied successfully to awide range of case studies from pest invasion (eg Harmoniaaxiridis Lombaert et al 2014) to anthropological research(eg Homo sapiens Verdu et al 2009) and is now widelyused by the population genetics community However anddespite the many advantages of ABC it relies on massivesimulations that can render it computationally costlyWhen multiple complex introduction scenarios are consid-ered the time and computational resources required tochoose among them can become prohibitive (Lombaertet al 2014) To overcome this constraint a new algorithmcalled ABC random forest (ABC-RF) has been developed(Pudlo et al 2016) Simulations show that ABC-RF discrimi-nates among scenarios more efficiently than traditional ABCmethods (Pudlo et al 2016) and thus it appears particularlysuitable for distinguishing among complex invasion pathways(see section below New approaches)

Here we evaluate the origins of invasive populations ofspotted-wing Drosophilla suzukii (Matsumura 1931) includ-ing where and whether they passed through bottlenecks inpopulations size or experienced admixture between geneti-cally distinct groups We use this invasion to illustrate the useof ABC-RF and as a case study to compare it to a morestandard ABC method Drosophila suzukii is a representativeof the melanogaster group and is historically distributed inSoutheast Asia covering a large portion of China JapanThailand and neighboring countries (Asplen et al 2015)Unlike most drosophilid species D suzukii females display alarge serrated ovipositor (Atallah et al 2014) allowing them tolay eggs in ripening fruits and they are thus a major threat tothe agricultural production of stone fruits and berries (Leeet al 2011) The presence of the species outside of its native

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

981Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

range was first recorded in the Hawaiian archipelago in theearly 1980rsquos (Kaneshiro 1983) but no further spread was ob-served until the late 2000rsquos when D suzukii was almost syn-chronously recorded in the southwest of the USA andsouthern Europe (Hauser 2011 Calabria et al 2012) By2015 the species had spread throughout most of theNorth-American and European continents and reached thesouthern hemisphere in Brazil (Depra et al 2014) Drosophilasuzukii represents a particularly appealing biological model forgaining further insights into the evolutionary factors associ-ated with invasion success As a closely related species of themodel insect Drosophila melanogaster its short generationtime and viability in laboratory conditions facilitate experi-mental approaches to study evolutionary events

ABC-based analyses of DNA sequences data obtained forsix X-linked gene fragments suggest that North American andEuropean populations represent separate invasion events(Adrion et al 2014) The sources of these invasions and po-tential admixture among different regions remain unclearThe wide native and introduced distribution of D suzukiiimply a potentially large number of possible sources andhence a large number of possible introduction scenarios Byfirst grouping sample size into genetic clusters the number ofpopulations treated independently can be reduced some-what (Lombaert et al 2014) but not fully Additionally thetemporal proximity of invasions in the US and Europe makesit difficult to a priori propose certain introduction scenariosover others Together these features of D suzukiirsquos invasionmean that a large number of scenarios will need to be com-pared dramatically increasing computational requirementsfor traditional ABC methods and hence making ABC-RFmethods particularly appealing

New ApproachesThe formal description of a finite set of possible introductionscenarios as models lays the foundation for ABC analyses andis outlined in Estoup and Guillemaud (2010) Choosingamong the formulated models is the central statistical prob-lem to be overcome when reconstructing routes of invasionfrom molecular data Both theoretical arguments and simu-lation experiments indicate that approximate posterior prob-abilities estimated from ABC analyses for the modeledintroduction scenarios can be inaccurate even though themodels being compared can still be ranked appropriately us-ing numerical approximation (Robert et al 2011) To over-come this problem Pudlo et al (2016) developed a novelapproach based on a machine learning tool named ldquorandomforestsrdquo (RF Breiman 2001) which selects among the complexintroduction models covered by ABC algorithms This ap-proach enables efficient discrimination among models andestimation of posterior probability of the best model whilebeing computationally less intensive

We invite readers to consult Pudlo et al (2016) to ac-cess to detailed statistical descriptions and testing of theABC random forest (ABC-RF) method Briefly randomforest (RF) is an algorithm that learns from a databasehow to predict a variable called the output from a possi-bly large set of covariates In our context the database is

the reference table which includes a given number ofdatasets that have been simulated for different scenariosusing parameter values drawn from prior distributionseach dataset being summarized with a pool of statistics(ie the covariates) RF aggregates the predictions of acollection of classification or regression trees (dependingwhether the output is categorical here the scenario iden-tity or quantitative here the posterior probability of thebest scenario) Each tree is built using the informationprovided by a bootstrap sample of the database and man-ages to capture one part of the dependency between theoutput and the covariates Based on these trees which areseparately poor to predict the output an ensemble learn-ing technique such as RF aggregates their predictions toincrease predictive performances to a high level of accu-racy in favorable contexts (Breiman 2001) RF is currentlyconsidered as one of the major state-of-art algorithm forclassification or regression

In ABC random forest (ABC-RF) Pudlo et al (2016) makestwo important advances regarding the use of summary sta-tistics and the identification of the most probable model Forthe summary statistics given a pool of different metrics avail-able ABC-RF extracts the maximum of information from theentire set of proposed statistics This avoids the arbitrarychoice of a subset of statistics which is often applied inABC analyses and also avoids what in the statistical sciencesis called ldquothe curse of dimensionalityrdquo [see Blum et al (2013)for a comparative review of dimension reduction methods inABC] With respect to identifying the most probable modelABC-RF uses a classification vote system rather than the pos-terior probabilities traditionally used in ABC analysis The firstoutcome of a ABC-RF analysis applied to a given target data-set is hence a classification vote for each competing modelwhich represents the number of times a model is selected in aforest of n classification trees The model with the highestnumber of classification votes corresponds to the model bestsuited to the target dataset among the set of competingmodels As a byproduct this step also provides a measureof the classification error called the prior error rate The priorerror rate is calculated as the probability of choosing a wrongmodel when drawing model index and parameter values intopriors The second outcome of ABC-RF is an estimation of theposterior probability of the best model that has been selectedusing a secondary random forest that regresses the modelselection error of the first-step random forest over the avail-able summary statistics

Pudlo et al (2016) shows that as compared to previousABC methods ABC-RF offers at least four advantages (i) itsignificantly reduces the model classification error as mea-sured by the prior error rate (ii) it is robust to the numberand choice of summary statistics as RF can handle manysuperfluous andor strongly correlated statistics with no im-pact on the performance of the method [see Blum et al(2013) for alternative methods of dimension reduction] (iii)the computing effort is considerably reduced as RF requires amuch smaller reference table compared with alternativemethods (ie a few thousands of simulated datasets versushundreds of thousands to millions of simulations per

Fraimout et al doi101093molbevmsx050 MBE

982Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

compared model) and (iv) it provides more reliable estima-tion of posterior probability of the selected model (ie themodel that best fit the observed dataset)

To the best of our knowledge the present study is the firstto use the ABC-RF method for an ensemble of model choiceanalyses with different levels of complexity (ie various num-ber of compared models including various and sometimeslarge number of parameters sampled populations and hencenumber of summary statistics) on real multi-locus microsat-ellite datasets Specifically we analyze molecular data at 25microsatellite loci from 23 D suzukii sample sites locatedacross most of its native and introduced range We con-ducted our population genetics study in five steps (1) Wedefined focal genetic groups and used this information toformalize 11 sets of competing introduction scenarios Wethen choose among them in a sequential ABC-RF analysis (2)We compared the performance of ABC-RF to a more stan-dard ABC method for a subset of analyses [ABC-LDA Estoupet al (2012) see ldquoMaterials and Methodsrdquo section] Wechoose ABC-LDA in our comparative study because it is con-sidered to be one of the most efficient standard ABC methodsto discriminate among models (Pudlo et al 2016) and it iswidely used in population genetics studies (Lombaert et al2014 and reference therein) (3) We refined the inferredworldwide invasion scenario by assessing the most likely or-igins of the primary introductions from the native area (4)We estimated the posterior distributions under the final in-vasion scenario of demographic parameters associated withbottleneck and genetic admixture events (5) Finally we per-formed model-posterior checking analyses to insure that thefinal worldwide invasion scenario displayed a reasonablematch to the observed dataset

Results

Origins of Invasive Genetic GroupsPrior to conducting any type of ABC analysis focal geneticgroups must be defined by characterizing genetic diversityand structure within and between all genotyped sample sitesusing traditional statistics and clustering methods This isdescribed in details in supplementary appendix S1Supplementary Material online We defined seven main ge-netic groups from our initial set of 23 sample sites (supplementary table S1 Supplementary Material online) Asia (sam-ple sites CN-Lan CN-Lia CN-Nin CN-Shi JP-Tok and JP-Sap)Hawaii (sample site US-Haw) western US (sample sites US-Wat US-Sok and US-SD) eastern US (sample sites US-ColUS-NC US-Wis and US-Gen) Europe (sample sites GE-DosFR-Par FR-Bor FR-Mon SW-Del SP-Bar and IT-Tre) Brazil(BR-PA) and La Reunion (FR-Reu)

This genetic grouping along with historical and geograph-ical information (supplementary table S1 SupplementaryMaterial online) allowed us to formalize 11 nested sets ofcompeting invasion scenarios that we analyzed sequentiallyusing ABC model choice methodologies The date thatD suzukii was first recorded in a location was used to helpformulate the scenarios For example D suzukii was first re-corded in the continental US from California (US-Wat) in

2008 and was not recorded in the central state of Colorado(US-Col) until 2012 US-Wat could therefore serve as a sourcefor US-Col but not vice versa We present the main analysesin table 1 In supplementary table S2 Supplementary Materialonline a more detailed description of each competing sce-narios considered for each analysis is provided The 11 se-quential ABC analyses permit step-by-step reconstructionof the introduction history of D suzukii worldwide Resultsfor each analysis based on prior set 1 (which used boundeduniform distributions of parameters see ldquoMaterials andMethodsrdquo section) and a single set of sample sites represen-tative of the genetic groups defined above are given in table 2For all 11 analyses similar ABC-RF results were obtained usingthe prior set 2 (which used more peaked distributions supplementary table S3 Supplementary Material online) andconsidering various sets of representative sample sites (supplementary table S4 Supplementary Material online)

By choosing among different competing scenarios eachanalysis identifies the most probable source(s) for a givenintroduced population The results come in the form of dif-fering levels of support for the competing introduction mod-els It is thus worth stressing here that the identification of themost probable source population X (from among a finite setof possible source populations) does not necessarily meanthat the introduced population Y originated from exactlylocation X but rather that the most probable origin of thefounders of the invasive population sampled at site Y is apopulation genetically similar to the source population sam-pled at site X However for sake of concision we will usehereafter the simplified terminology that the most probableorigin of Y is X

After the early invasion of Hawaii (1980) the first recordedintroduction was to the genetically heterogeneous westernUS and thus that is where our main analyses start (analyses1andash1d table 2) We considered the three western US samplesites in separate analyses and we found in each case that thebest scenario included genetic admixture between Asia andHawaii as sources (table 2 analysis 1andash1c mean posteriorprobability of the best scenario Pfrac14 0999) We subsequentlytested if this common admixture pattern resulted from asingle or multiple introduction events from Asia andorHawaii including the possibility of local colonization eventsamong western US sites (analysis 1d) The best invasion sce-nario for the western US included an initial admixture eventwith Asia and Hawaii as probable sources in northernCalifornia (site US-Wat admixture event A1 in fig 1) followedby (i) a local colonization southward into San Diego (US-SD)and northward into Oregon state (US-Sok) and (ii) a second-ary introduction event from Hawaii to Oregon (US-Sok) (A2event in fig 1 Pfrac14 0690) This result is supported by variousgenetic clustering results (fig 1 supplementary figs S1 and S2Supplementary Material online) as well as raw F-statisticsvalues which indicate high differentiation level between US-Sok and other continental US sample sites and a substantiallylower FST between US-Sok and Hawaii (supplementary tableS5 Supplementary Material online) Once the invasion historywas resolved for the western US group we inferred the mostprobable source(s) of the invasive eastern US and European

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

983Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

genetic groups using two independent analyses (analyses 2aand 2b table 1 and supplementary table S2 SupplementaryMaterial online) We found evidence for a single origin for theeastern US group corresponding to an intra-continentalspread from the San Diego area (Pfrac14 0779 table 2)

The most probable origin for Europe was an independentintroduction from the Asian native range (Pfrac14 0716 table 2)The European and North American invasions occurred nearlysimultaneously and thus we also explored gene flow betweenthe two regions Europe was never selected as a source for theeastern US (analysis 3a Pfrac14 0744) no matter which sets ofsample sites were considered In contrast we found someevidence of asymmetrical gene flow from the eastern USinto at least some European locations When consideringthe northern European sample site from Germany (GE-Dos) as a target we found that the best scenario includedan admixed origin with Asia and the eastern US (supplementary table S4 Supplementary Material online analysis 3bPfrac14 0488 and Pfrac14 0501 for prior set 1 and 2 respectively)In contrast the best scenario did not include such admixtureevent when considering the southern European sample sitefrom Italy (IT-Tre) as a target (table 2 analysis 3b Pfrac14 0510)We therefore replicated analysis 3b on all possible combina-tions of European and eastern US sample sites to assess therobustness of this result and evaluate the possibility of a geo-graphic pattern for admixture events For each of theEuropean sample sites figure 2 summarizes which replicateanalyses selected a scenario including an admixture betweeneastern US and Asia or a scenario of a single introductionfrom Asia (and see supplementary fig S3 SupplementaryMaterial online for results using an alternative representativesample site for the native range) Results tended to indicate

an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north-south gradient More specifically eastern US alleles were pre-sent in northern Europe and absent or at least not detectedby our model choice method in southern Europe We thenfurther tested whether the observed admixture in the northcorresponded to the mixing of eastern US genes with south-ern European genes originating from the initial introductionevent from Asia versus the mixing of eastern US genes withAsian genes introduced through a separate secondary intro-duction event from Asia in northern Europe (analysis 4) Thebest scenario was clearly that of an admixture between east-ern US genes and southern European genes which them-selves had a probable origin as part of the initialintroduction event from Asia (Pfrac14 0998 A3 event in fig 1)

Finally we found that the sampled population from Braziloriginated from North America with genetic admixture be-tween D suzukii individuals from the southwest and easternregions of the US (analysis 5a Pfrac14 0631 A4 event in fig 1)Regarding the most recent invasive population from LaReunion we found that this island population originatedfrom Europe with admixture between individuals fromnorthern and southern Europe (analysis 5b Pfrac14 0500 A5event in fig 1)

Comparison of Model Choice Analyses Using ABC-RFversus ABC-LDAFor all 11 analyses ABC-LDA performed well when using largereference tables (ie 500000 simulations per scenario table 2and supplementary table S3 Supplementary Material online)Indeed the prior error rates for ABC-LDA with large referencetables were smaller than the prior error rates for ABC-RF with

Table 1 Formulation of the Model Choice Analyses that were Carried Out Successively to Reconstruct Invasion Routes of D suzukii Using ABC

ModelChoiceAnalysis

Number ofComparedScenarios

Tackled Question Potential source genetic group Focal populations

1a 3 What are the origins of western US populations Asia Hawaii US-Wat1b US-Sok1c US-SD1d 7 What are the relations among western US popu-

lations and their extra-continental sourcesAsia Hawaii US-Wat US-Sok

US-SDUS-Wat thorn US-Sokthorn US-SD

2a 6 What are the origins of eastern US populations Asia Hawaii western US eastern US2b 6 What are the origins of European populations Asia Hawaii western US Europe3a 10 Is there asymmetrical gene flow from Europe to

eastern USAsia Hawaii western US Europe eastern US

3b 10 Is there asymmetrical gene flow from eastern USto Europe

Asia Hawaii western US eastern US Europe

4 2 Does the admixture with eastern US genes innorthern Europe result from a secondary Asianintroduction

Asia Hawaii western US eastern USsouthern Europe

northern Europe

5a 21 What are the origins of the Brazilian population Asia Hawaii western US eastern USsouthern Europe northern Europe

Brazil

5b 21 What are the origins of La Reunion population Asia Hawaii western US eastern USsouthern Europe northern Europe

La Reunion

NOTEmdash Each numbered analysis is a comparison of a certain number of scenarios by ABC model choice We summarize each analysis by stating the question that it addressed Adetailed verbal description of each compared scenario is given in supplementary table S2 Supplementary Material online The 11 analyses are nested in the sense that eachsubsequent analysis use the result obtained from the previous one For example in Analysis 2a ldquoWhat are the origins of eastern US populationsrdquo capitalizes on the historyinferred for western US populations from analysis 1d ldquoPotential source genetic grouprdquo indicates all the potential source populations considered in the analyses for which onewants to identify the origin of the focal population (ie the target)

Fraimout et al doi101093molbevmsx050 MBE

984Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Tab

le2

Res

ult

so

fM

od

elC

ho

ice

An

alys

esU

sin

gA

BC

-RF

and

AB

C-L

DA

Pri

or

erro

rra

teP

ost

erio

rP

rob

abili

tyo

fth

eb

est

Mo

del

Ori

gin

of

the

foca

lp

op

ula

tio

nu

sin

gei

ther

AB

C-R

Fo

rA

BC

-LD

A(i

e

bes

tm

od

el)

An

alys

isT

ota

lNu

mb

ero

fSc

enar

ios

Nu

mb

ero

fSu

mS

tats

A

BC

-RF

(sd

)A

BC

-LD

A(l

arge

refe

ren

ceta

ble

)

AB

C-L

DA

(sm

all

refe

ren

ceta

ble

)

AB

C-R

F(s

d)

AB

C-L

DA

(lar

gere

fere

nce

tab

le)

[CI]

1a3

390

100

(60

002)

007

50

085

100

0(6

000

1)1

000

[10

001

000

]A

siathorn

Haw

aii

1b3

009

4(6

000

1)0

070

009

10

989

(60

005)

099

9[0

999

09

99]

1c3

009

6(6

000

1)0

079

008

70

998

(60

002)

099

9[0

999

09

99]

1d7

130

032

7(6

000

1)0

274

036

80

690

(60

018)

074

1[0

714

07

67]

Asi

athorn

Haw

aii

2a6

130

023

2(6

000

1)0

203

024

10

779

(60

019)

078

8[0

764

08

13]

Wes

tern

US

2b6

130

023

2(6

000

1)0

179

039

30

716

(60

013)

060

4[0

592

06

16]

Asi

a

3a10

204

032

8(6

000

1)0

265

036

40

744

(60

020)

084

4[0

820

08

68]

Wes

tern

US

3b10

204

040

8(6

000

1)0

358

042

30

510

(60

014)

040

9[0

391

04

26]

Asi

a

42

301

012

1(6

000

1)0

111

021

21

000

(60

009)

099

9[0

999

09

99]

Sou

ther

nEu

rop

ethorn

east

ern

US

5a21

424

029

4(6

000

1)0

230

042

30

631

(60

034)

081

2[0

788

08

37]

Wes

tern

USthorn

east

ern

US

5b21

424

029

8(6

000

1)0

242

042

80

500

(60

027)

029

0[0

256

03

23]

No

rth

ern

Euro

pethorn

sou

ther

nEu

rop

e

NO

TEmdash

All

mo

del

cho

ice

anal

yses

wer

eca

rrie

do

ut

usi

ng

the

pri

or

set

1(s

up

ple

men

tary

tab

leS8

Su

pp

lem

enta

ryM

ater

ialo

nlin

e)an

da

sin

gle

set

ofs

amp

lesi

tes

rep

rese

nta

tive

oft

he

pre

-defi

ned

gen

etic

gro

up

sSa

mp

lesi

teJP

-To

kfo

rth

en

ativ

eA

sian

gro

up

US-

Haw

fort

he

Haw

aiia

ngr

ou

pU

S-W

atU

S-So

kan

dU

S-SD

fort

he

wes

tern

US

gro

up

US-

NC

fort

he

east

ern

US

gro

up

IT

-Tre

fort

he

sou

ther

nEu

rop

egr

ou

pG

E-D

os

fort

he

no

rth

ern

Euro

pe

gro

up

BR

-PA

fort

he

Bra

zilg

rou

pa

nd

FR-

Reu

for

the

LaR

eun

ion

gro

up

(fig

1)D

atas

ets

wer

esu

mm

ariz

edu

sin

gth

ew

ho

lese

to

fsu

mm

ary

stat

isti

csp

rop

ose

db

yD

IYA

BC

(Co

rnu

etet

al2

014)

Th

eto

taln

um

ber

ofs

um

mar

yst

atis

tics

(Nu

mb

ero

fSu

mS

tats

)as

wel

las

the

tota

lnu

mb

ero

fco

mp

ared

scen

ario

s(T

ota

lnu

mb

ero

fsce

nar

ios)

are

ind

icat

edfo

reac

han

alys

isP

rio

rerr

orr

ates

and

po

ster

iorp

rob

abili

ties

oft

he

bes

tmo

del

cho

sen

usi

ng

AB

C-R

Fw

ere

aver

aged

ove

r10

rep

licat

ean

alys

esA

BC

-RF

and

AB

C-L

DA

trea

tmen

tsyi

eld

edth

esa

me

bes

tm

od

elch

oic

efo

ral

lan

alys

esan

dis

den

ote

din

the

colu

mn

ldquoOri

gin

oft

he

foca

lpo

pu

lati

on

usi

ng

eith

erA

BC

-RF

or

AB

C-L

DA

(ie

b

est

mo

del

)rdquoA

dm

ixtu

reb

etw

een

two

sou

rce

po

pu

lati

on

sar

ere

pre

sen

ted

by

aldquothorn

rdquosi

gnA

BC

-LD

Ap

ost

erio

rpro

bab

iliti

eso

fth

eb

est

mo

del

sw

ere

esti

mat

edu

sin

gldquol

arge

refe

ren

ceta

ble

srdquow

ith

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioA

BC

-LD

Ap

rio

rerr

orr

ates

wer

eco

mp

ute

du

sin

gre

fere

nce

tab

les

oft

wo

diff

eren

tsi

zes

ldquolar

gere

fere

nce

tab

lerdquo(

ie

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

io)

and

ldquosm

allr

efer

ence

tab

lerdquo

(ie

10

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioas

for

AB

C-R

Fan

alys

es)

SD

sta

nd

sfo

rst

and

ard

dev

iati

on

ove

r10

rep

licat

ean

alys

esan

dC

Ifo

r95

co

nfi

den

cein

terv

alco

mp

ute

dfo

llow

ing

Co

rnu

etet

al(

2008

)

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

985Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

small reference tables (paired t-test t10frac1427 Pfrac14 002)However ABC-RF performed better than ABC-LDA whenusing small reference tables (only 10000 simulations per sce-nario) having lower prior error rates (paired t-test t10frac14116Plt 00001 table 2 and supplementary table S3Supplementary Material online) The difference in prior errorrate was particularly marked for the complex analyses iethose in which many introduction scenarios were comparedand many summary statistics were computed For instancefor the analyses 5a and 5b (21 compared scenarios and 424summary statistics computed) prior error rates were 0217and 0221 for ABC-RF versus 0417 and 0425 for ABC-LDA

Regarding computational effort we found that for a givenobserved dataset an ABC-RF treatment to select the bestscenario and compute its posterior probability required a

ca 50 times shorter computational duration than when pro-cessing a ABC-LDA treatment with a traditional referencetable of large size (ie 500000 simulated datasets per sce-nario) Moreover estimation of prior error rates with ABC-RF took only a few additional minutes of computation timefor 10000 pseudo-observed datasets whereas with ABC-LDAit lasted several hours to several days (depending on theanalysis processed) for 500 pseudo-observed datasets

The same invasion scenario had the highest probability forall 11 analyses carried out using ABC-RF and ABC-LDA withlarge number of simulated datasets (table 2 and supplementary table S3 Supplementary Material online) Posterior prob-abilities of the best scenario estimated using ABC-RF were notsystematically higher (or lower) than those using ABC-LDAwith large number of simulated datasets In particular there

FIG 1 Worldwide invasion scenario of D suzukii inferred from microsatellite data and date of first observation Map and schematic showingsample sites and the invasion routes taken by D suzukii as reconstructed by ABC-RF (Pudlo et al 2016) on a total of 685 individuals from 23geographic locations genotyped at 25 microsatellite loci (see results and methods for details) The native range is in dark grey and the invasiverange is in light-gray (cf delimitation from fig 1 in Asplen et al 2015) The year in which D suzukii was first observed at each sample site is indicatedin italics The 23 geographical locations that were sampled are represented by circles (native range) and squares diamonds and triangles(introduced range) Squares indicate populations that experienced weak bottlenecks (ie median value of bottleneck severity lt 012 seemain text and table 3) diamonds indicate moderate bottlenecks (012 lt bottleneck severity lt 022) and triangles indicate strong bottlenecks(ie bottleneck severitygt 03) The colors of the symbol for the sample sites and arrows between them correspond to the different genetic groupsobtained using the clustering method BAPS (supplementary appendix S1 Supplementary Material online) The arrows indicate the most probableinvasion pathways A1ndashA5 indicate five separate admixture events between different sources O1ndashO3 indicate the most probable sources withinthe native range for the primary introduction events A1frac14 Hawaiithorn southeast China A2frac14Watsonville (western US)thorn Hawaii A3frac14 southernEurope thorn eastern US A4 frac14 western US thorn eastern US A5 frac14 southern Europe thorn northern Europe O1 frac14 Japan O2 frac14 southeast China O3 frac14northeast China

Fraimout et al doi101093molbevmsx050 MBE

986Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

was no clear trend of overestimation of posterior probabilitywhen using ABC-LDA with large number of simulated data-sets versus ABC-RF a potential bias suggested by Pudlo et al(2016) On the other hand we found evidence of instability inthe estimation of posterior probabilities and hence modelselection when using ABC-LDA with a small numbers of sim-ulated datasets in the reference table (ie 10000 simulationsper scenario as for ABC-RF) In this case the scenario with thehighest probability was different from that selected using ei-ther ABC-LDA with the large reference table or ABC-RF infour and two of the 11 analyses when using the prior sets 1and 2 respectively (results not shown)

Refining the Origins of the Primary IntroductionEvents from the Native AreaAdditional ABC-RF analyses were run to determine whichgeographical area in the native range was the most likelyorigin of each of the three primary introductions (ie inHawaii western US and Europe) The most probable originof the Hawaiian invasive population was Japan with a mean

posterior probability of 0959 (origin O1 in fig 1) ForWatsonville (western US sample site US-Wat) the mostprobable source of the primary introduction from Asia wassoutheast China (origin O2 in fig 1 Pfrac14 0860) Finally forEurope the most probable source of the primary introduc-tion from Asia was northeast China (origin O3 in fig 1Pfrac14 0855) Similar posterior probabilities were obtained usingthe prior set 1 (values presented above) and the prior set 2(results not shown) For inferences regarding Europe similarposterior probabilities were also obtained using various sets ofsample sites from Europe and the eastern US (results notshown)

Estimation of Parameter Distributions for AdmixtureRate and Bottleneck SeverityWe used the most probable worldwide invasion scenario (fig1) to estimate the posterior distributions of parameters re-lated to bottleneck severity associated to the foundation ofnew populations and genetic admixture rates between differ-entiated sources (ie A1ndashA5 events in fig 1)

FIG 2 Admixed or non-admixed origin of D suzukii in Europe inferred for each sample sites Potential source populations include (among others)the Asian native range (represented by the Japanese sample site JP-Tok) and the invasive genetic group from eastern US represented by one of thefour invasive sample sites collected in this area (fig 1) Four replicate independent ABC-RF treatments corresponding to the analysis 3b (table 2)were hence carried out for each targeted European sample site using one of the four eastern US sample site The treatments labeled 1 2 3 and 4 inthe pies of the figure have been carried out with the sample sites US-NC US-Wis US-Gen and US-Col respectively A pie quarter in blue indicatesthat the best scenario corresponds to a single introduction event from Asia A pie quarter in red indicates that the best scenario corresponds to anadmixture event between Asia and eastern US Dates in italic correspond to the dates of first record of the European sample sites

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

987Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

The posterior distributions parameters relating to bottle-necks and admixture substantially differed from the priorsindicating that genetic data were informative for such param-eters (tables 3 and 4 and see supplementary tables S6 and S7Supplementary Material online for results using an alternativeset of representative sample sites) Regarding bottleneck se-verity we found that bottlenecks tended to be more severefor populations founded by individuals originating from asource located on a different continent (extra-continentalorigin) than for populations founded by individuals originat-ing from a source located on the same continent (intra-con-tinental origin table 3 and fig 1) The bottleneck severity forpopulations founded by individuals corresponding to a com-bination of the two types of sources (extra- and intra-continental origins) was either weak or moderate The stron-gest bottleneck severity was found for the first invasive

population in Hawaii which showed the lowest level of ge-netic variation among all invasive populations Bottleneckseverity values were globally slightly stronger in Europeanthan in US populations suggesting a smaller number offounding individuals andor a slower demographic recoveryin European populations

Regarding admixture (table 4) we found that (i) for thefirst recorded invasive population in western US (ieWatsonville sample site US-Wat) the genetic contributionfrom China was substantially larger than that from Hawaii(median value of 0759) (ii) for the western US populationUS-Sok the secondary genetic contribution from Hawaii wassmall compared to that from Watsonville (median value of0237) and (iii) the sample sites from northern Europe (egGE-Dos in Germany) contained a rather low proportion ofgenes originating from the eastern US (median value of

Table 3 Bottleneck Severity in Invasive Populations of D suzukii

Prior 1 Prior 2Introduction Type Sample Site Mean Median Mode q5 q95 Mean Median Mode q5 q95

Extra continental US-Haw 0517 0500 0495 0326 0755 0490 0484 0477 0382 0615US-Wat 0181 0138 0114 0041 0448 0143 0127 0118 0059 0275IT-Tre 0268 0179 0171 0059 0837 0236 0189 0172 0101 0553BR-PA 0158 0126 0104 0020 0407 0208 0193 0172 0067 0399FR-Reu 0259 0215 0186 0051 0626 0245 0214 0190 0069 0534

Intra-continental US-SD 0135 0117 0106 0039 0283 0117 0104 0091 0053 0224US-NC 0089 0067 0061 0015 0230 0111 0098 0090 0046 0218

Extra thorn intracontinental

US-Sok 0116 0099 0088 0027 0250 0119 0106 0099 0052 0229GE-Dos 0189 0177 0155 0061 0356 0205 0189 0171 0088 0372

Prior values 0628 0199 NA 0021 1904 0517 0500 NA 0326 0754

NOTEmdash Extra-continental introductions correspond to a long distance introduction from a source located apart from the continent of the focal population intra-continentalintroduction corresponds to an introduction event from a source located on the same continent than the focal population and Extrathorn Intra continental introductioncorresponds to a combination of the two types of sources Mean median and mode estimates as well as bounds of 90 credibility intervals (q5 and q95) are indicated foreach bottleneck severity parameter We roughly classified the estimated bottleneck severity values into three classes (represented here by the three shades of gray) weak (iemedian value of bottleneck severitylt 012 in light gray) moderate (012lt bottleneck severitylt 022 in gray) and strong (ie bottleneck severitygt 03 in dark gray) The set ofsample sites used for the ABC estimations presented here include (i) for the native area Japan (JP-Tokthorn JP-Sap) South-East China (CN-Nin) and North-East China (CN-LanthornCN-Lia) and (ii) for the invaded range US-Wat US-Sok and US-SD for western US US-NC for eastern US IT-Tre for southern Europe GE-Dos for northern Europe BR-PAfor South-America (Brazil) and FR-Reu for La Reunion island Code names of the sample sites are the same as in fig 1 and supplementary table S1 Supplementary Materialonline in which bottleneck severity classes are also given for each sample site See supplementary table S6 Supplementary Material online for results on a different set ofrepresentative sample sites

Table 4 Posterior Distributions of Admixture Rates for the Five Admixture Events Inferred for the Final Worldwide Invasion Scenario Described infigure 1

Admixture Event Admixture Rate (gene fraction from pop x) Mean Median Mode q5 q95

Prior 1 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0759 0761 0774 0659 0854A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0237 0230 0227 0092 0400A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0286 0278 0266 0112 0481A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0467 0461 0442 0187 0754A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0388 0380 0426 0132 0670

Prior 2 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0761 0762 0766 0684 0833A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0224 0219 0199 0114 0350A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0312 0306 0284 0152 0487A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0422 0418 0429 0247 0613A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0370 0368 0356 0178 0569

NOTEmdash Admixture events are denoted as in figure 1 Each admixture rate parameter r points to the name of the admixed site and corresponds to the fraction of genesoriginating from the source site in parentheses (1 r genes originate from the other source site) Mean median and mode estimates as well as bounds of 90 credibilityintervals (q5 and q95) are indicated for each admixture parameter Estimations assuming the prior set 1 and the prior set 2 (supplementary table S8 Supplementary Materialonline) are provided The set of representative sample sites used for the ABC estimations presented here is the same than in table 3 Code names of the population sites are thesame as in fig 1 and supplementary table S1 Supplementary Material online See supplementary table S7 Supplementary Material online for results on a different set ofrepresentative sample sites

Fraimout et al doi101093molbevmsx050 MBE

988Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

0286) We found more balanced admixture rates for thepopulations from Brazil (BR-PA) and La Reunion (FR-Reu)but information on this parameter was less accurate in thesecases as indicated by large 90 credibility intervals

Model-Posterior CheckingLike any model-based methods ABC inferences do not revealthe ldquotruerdquo evolutionary history but allows choosing the bestamong a necessarily limited set of scenarios that have beencompared and to estimate posterior distributions of param-eters under this scenario How well the inferred scenario-posterior combination matches with the observed datasetremains to be evaluated using an ABC model-posterior check-ing analysis When applying such an analysis on the final in-vasion scenario detailed in figure 1 we found that whenconsidering the prior set 1 only 54 of the 1141 summarystatistics used as test quantities had low posterior predictiveP-values (ie 0002lt ppp-valueslt 5) Moreover none ofthose ppp-values values remained significant when correctingfor multiple comparisons (Benjamini and Hochberg 1995)Similar results were obtained when assuming the prior set 2and when considering different sets of representative samplesites in our analyses (results not shown) These findings showthat the final worldwide invasion scenario (fig 1 with associ-ated parameter posterior distributions) matches the observeddataset well In agreement with this the projections of thesimulated datasets on the principal component axes from thefinal model-posterior combination were well grouped andcentered on the target point corresponding to the observeddataset (supplementary fig S4 Supplementary Material on-line) Due to the modest size of the dataset (ie 25 microsat-ellite loci) however only situations of major inadequacy ofthe model-posterior combination to the observed dataset arelikely to be identified

DiscussionIn the present paper we decipher the routes taken byD suzukii in its invasion worldwide evaluating evidence ofbottlenecks and genetic admixture associated with differentintroductions and we quantify the efficiency of ABC-RF rel-ative to the more standard ABC-LDA method for choosingamong introduction scenarios

A Complex Worldwide Invasion HistoryPrior to this study and that of Adrion et al (2014) our un-derstanding of the worldwide introduction pathways ofD suzukii was based on historical and observational datawhich were incomplete and potentially misleading Our re-sults indicate three distinct introductions from the nativerangemdashto Hawaii the western side of North America andwestern Europe which accords well to findings of Adrionet al (2014) from a different set of markers (X-linked se-quence data) Both the western North American introduc-tions and at least some European populations show signs ofadmixture Hawaii and China both appear to have contrib-uted to the western North American introductions whichwas in turn the most probable source for the introductioninto eastern North America Similarly China and to a lesser

extent eastern North America both appear to have contrib-uted to the European introduction but mostly in northernEuropean areas with respect to the eastern North Americancontribution The almost simultaneous invasion of NorthAmerica and Europe from Asia could be due to increasedtrade between these areas facilitating transport of this speciesbetween regions Alternatively or in addition adaptation tohuman-altered habitats (in this case changes in agriculturalpractices) within the native range of the species could havepromoted invasion to similarly altered habitats worldwide(ie anthropogenicaly induced adaptation to invadeHufbauer et al 2012) However the two invaded continentswere most likely colonized by flies from distinct Chinese geo-graphic areas (fig 1) requiring that adaptation to agricultureoccurred concomitantly at several locations within the nativerange This is possible but not evolutionarily parsimoniousand further data would be required to test this explanationSubsequent introductions from the three primary founderlocations to other locations result in a complex history ofinvasion Both primary and secondary introductions showsigns of demographic bottlenecks and several invasive pop-ulations have multiples sources and thus experienced geneticadmixture between differentiated native or invasivepopulations

The genetic relationships between worldwide D suzukiipopulations define the genetic variation they harbor andmay continue to shape further spread of alleles among pop-ulations Gene flow among populations through continuousdispersal or more punctual admixture events permits thedissemination of allelic variants and new mutations withdramatic consequences for evolutionary trajectories(Lenormand 2002) We found evidence for asymmetric ge-netic admixture from North American towards (northern)European populations If this signal of admixture reflects re-current and on-going dispersal events rather than a singlepast secondary introduction event then variants that arisein North America would be more likely to spread to Europethan the reverse Discriminating between ongoing and pastdispersal events is tricky however especially for recent inva-sions (Benazzo et al 2015) If the occurrence of recurrent geneflow is confirmed then D suzukii populations from Europemay not lag behind American ones in evolutionary termsFrom an applied perspective if resistance to control practicesemerges in North America special efforts to stop the impor-tation of D suzukii to Europe would help prevent the evolu-tion of resistance in Europe and thus reduce damage to crops(Caprio and Tabashnik 1992)

Advances in Methods for Discriminating amongComplex ModelsTo the best of our knowledge the present study is the first touse the recently developed ABC-RF method (Pudlo et al2016) to test competing models characterized by differentlevels of complexity To compare ABC-RF to a more standardABC method we carried out a subset of model choice anal-yses using both ABC-RF and ABC-LDA (Estoup et al 2012)We ran ABC-LDA analyses in two ways with standard sizedlarge reference tables (500000 simulated datasets per

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

989Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

scenario) and with small reference tables (10000 simulateddatasets per scenario the same size as used for ABC-RF) Theperformance of ABC-LDA (in terms of prior error rate) inchoosing among introduction scenarios was slightly betterthan for ABC-RF when using large reference tablesFurthermore for ABC-LDA with large reference tables confi-dence in model choice (in terms of posterior probability) wasrelatively comparable to ABC-RF with small reference tablesThus when analyses are relatively simple or computationalresources are not limiting ABC-LDA can provide as robustinferences of invasion scenarios as ABC-RF

In contrast we found that ABC-RF out-performed ABC-LDA when using similarly small reference tables for both typesof analyses (10000 simulated datasets per scenario) ABC-LDA also was unstable choosing different scenarios in differ-ent replicate analyses when using small reference tablesThese empirical findings correspond well with those ofPudlo et al (2016) using simple models for which the trueposterior probabilities could be calculated Pudlo et al (2016)demonstrated that ABC-RF provided more reliable posteriorprobability of the best (true) scenario than standard ABCmethods when using the same (small) number of simulateddatasets in the reference table (see supplementary fig S3 inPudlo et al 2016) In our work the difference in performancebetween the two methods was most pronounced for com-plex analyses (eg when comparing more than 10 scenariosusing more than 100 summary statistics) Thus when analysesare not simple or computational resources are finite ABC-RFprovides more robust inferences of invasion scenarios thanABC-LDA

The lower computation cost of ABC-RF was also evidentFor a given observed dataset selecting the best supportedscenario and computing its posterior probability was ca 50times faster than when processing an ABC-LDA treatmentwith a traditional reference table of large size Moreover es-timation of prior error rates with ABC-RF took only a fewadditional minutes of computation time whereas it lastedseveral hours to several days with ABC-LDA The conse-quence of low computational costs is not simply in computertime Rather the efficiency of ABC-RF analyses allowed us torun replicate analyses on various sample sets even for themost complex D suzukii invasion scenarios This replication iscritical in evaluating the robustness of our statisticalinferences

We found that at least for some ABC-RF analyses theposterior probability of the best scenario and the global sta-tistical power to choose among alternative invasion scenarios(as measured by the prior error rate) were relatively low Forinstance in three of the 11 analyses (analyses 3b 5a and 5b)the best scenarios had relatively low probabilities (ie rangingfrom 0500 to 0630) and relatively high prior error rates (ieranging from 0300 to 0400) Although replicate analyses car-ried out on different sample sets and prior sets pointed to thesame best scenario for these three analyses such low proba-bility values and high prior error rates suggest a moderatelevel of confidence in the choice of model in these casesSimilarly we found that the scenario choice leading to theconclusion of the presence versus absence of admixed genes

from the eastern US in various European sample sites reliedon probability values that were sometimes as low as 0500Moreover in a minority of cases the best scenario was differ-ent depending on the sample site considered as representa-tive of the eastern US or the native area The observed patternof an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north tosouth gradient should hence be taken cautiously The anal-yses of additional European sample sites are needed to clarifythis issue The above inferential uncertainties also most likelyfind their sources in the relatively small number of markersgenotyped relative to the number complexity and similarityof some of the competing models Such results indicate thatthere is room for improving the robustness of our inferencesat least for a subset of our analyses

Bottlenecks and Genetic AdmixtureWe found strong evidence that bottlenecks and admixtureboth occurred during the course of the invasion of D suzukiiA first sign of bottlenecks is a reduction in neutral geneticdiversity We found that all invasive D suzukii populationswere characterized by significantly lower genetic variationthan native ones with a loss of 64 (US-Col) to 233(US-Haw) of heterozygosity and of 273 (US-Wat) to542 (US-Haw) of allelic diversity respectively (supplementary fig S5 Supplementary Material online) Adrion et al(2014) find a similar significant loss of diversity in their datasetfocused on gene sequences Dlugosch and Parker (2008) andUller and Leimu (2011) reviewed studies of neutral geneticdiversity in a large number of species of animals plants andfungi and compared nuclear molecular diversity within intro-duced and source populations Overall they found that a lossof variation was the most frequent feature in invasive popu-lations However reductions in genetic variation were on av-erage modest (eg average loss of 187 and 155 ofheterozygosity and allelic diversity respectively Dlugoschand Parker 2008) The reductions observed in our study ininvasive D suzukii populations were thus in the same rangefor heterozygosity but higher for allelic diversity (cf averageloss of 111 and 345 of heterozygosity and allelic diversityrespectively) A larger loss of allelic diversity than heterozy-gosity after a bottleneck is nevertheless expected by theory(Nei et al 1975)

The type of introduction pathway explains at least partlythe severity of the reduction in genetic diversity Bottleneckstended to be less severe for populations founded by individ-uals from the same continent than for populations foundedby individuals originating from a different continent A simpleexplanation for such pattern is that due to greater proximitypopulations originating from the same continent are likely tobe initially founded by a larger numbers of individuals andalso to experience recurrent gene flow Populations foundedby individuals from both the same and different continentsexperienced weak or moderate reductions in diversity sug-gesting that multiple introduction pathways restore diversity

In agreement with previous studies on invasions on simi-larly large geographical scales (eg Lombaert et al 2014 andreference therein) we found that admixture events were

Fraimout et al doi101093molbevmsx050 MBE

990Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

frequent in the worldwide invasion history of D suzukii Weidentified at least five admixture events for which sourceswere native andor invasive populations (ie A1ndashA5 eventsin fig 1) The genetic contributions of the sources were un-balanced except for the populations from Brazil and LaReunion but genetic information regarding admixture wasconsiderably less accurate in the latter cases which involvedweakly differentiated source populations

We have provided here the information necessary to eval-uate whether bottlenecks and admixture have influencedoutcomes in this invasion Quantitative genetics studies inthe lab focusing on the key source bottlenecked and ad-mixed populations could now examine the fitness and adap-tive potential of those groups (Lavergne and Molofsky 2007Facon et al 2008 Turgeon et al 2011)

Conclusions and PerspectivesUnderstanding invasion pathways provides key informationfor understanding the role of evolutionary processes in bio-logical invasions For example only does our research revealsources of invasive populations for further relevant compar-isons using quantitative genetics studies in the labAdditionally we found that the invasion of Europe by Dsuzukii was distinct from that of North America with limitedand asymmetrical gene flow between these two main invadedareas This situation provides the opportunity to evaluateevolutionary trajectories in replicate into two separate tem-perate climate regions similarly to research by Gilchrist et al(2001 2004) on the pace of clinal evolution in the invasivefruit fly Drosophila subobscura There are also distinct invasionpathways for areas with warmer climates (ie Hawaii LaReunion and Brazil) that will make interesting comparisons

Retracing invasion routes and making inferences aboutdemographic processes is only possible if there is adequatepolymorphism within populations and significant genetic dif-ferentiation among them As is evident here microsatellitedata can provide enough variation to reveal important path-ways We concur with Adrion et al (2014) however thatgenome-wide data (ie next generation sequencingapproaches) will be tremendously powerful in further dis-criminating among complex invasion scenarios (see alsoCristescu 2015 Estoup et al 2016) Because of the reducedcomputational resources demanded by ABC-RF this methodwill be particularly useful for analysis of massive single nucle-otide polymorphism datasets (Pudlo et al 2016) In additionto the selectively neutral demographic inference leading tothe reconstruction of routes of invasion population (andquantitative) next generation sequencing approaches arequite promising for studying the evolution of phenotypictraits in natural populations (Wray 2013 Gautier 2015)Specifically they can be used to better understand the geneticarchitecture of traits underlying invasion success (Bock et al2015) Drosophila suzukii is a good species for such researchgiven (i) its short generation time and viability in laboratoryconditions which facilitate experimental approaches to studyquantitative traits of interest (Asplen et al 2015) and (ii) theavailability of annotated genome assemblies for this species

(Chiu et al 2013 Ometto et al 2013) along with the hugeamount of genomic resources available in its close relativespecies D melanogaster (Groen and Whiteman 2016)

Materials and Methods

Sampling and GenotypingAdult D suzukii were sampled from the field at a total of 23localities (hereafter termed sample sites) distributed through-out most of the native and invasive range of the species(supplementary table S1 Supplementary Material onlineand fig 1) Samples were collected between 2013 and 2015using baited traps and sweep nets and stored in ethanolNative Asian samples consisted of a total of six sample sitesincluding four Chinese and two Japanese localities Samplesfrom the invasive range were collected in Hawaii (1 samplesite) Continental US (7 sites) Europe (7 sites) Brazil (1 site)and in the French island of La Reunion (1 site) For eachsample site 15ndash44 adult flies were genotyped at 25 of the28 microsatellite loci described in Fraimout et al (2015) re-sulting in a total of 685 genotyped individuals Three micro-satellite loci from Fraimout et al (2015) (ie loci DS31 DS42and DS45) were not included due to the presence of nullalleles at frequenciesgt 10 in some sample sites (resultsnot shown) DNA extraction and PCR amplification as wellas allele scoring were performed following the methodologydetailed in Fraimout et al (2015)

General Framework for Defining Focal PopulationGroups and Sets of Scenarios to Be ComparedUsing ABCOne of the many challenges of ABC analyses is to optimallydefine sets of sample sites to be processed chronologically aspotential sources and targets in a way that do not hamper thecomputational effort As stated in Estoup and Guillemaud(2010) and Lombaert et al (2014) the number and complex-ity of competing scenarios is proportional to the number ofgenetic groups to be accounted in ABC analyses FollowingLombaert et al (2014) we used the results obtained fromthree genetic clustering methods along with historical andgeographical information to define genetic groups that couldinclude several sample sites (supplementary appendix S1Supplementary Material online) We defined seven main ge-netic groups from our set of 23 sample sites Asia Hawaiiwestern US eastern US Europe Brazil and La Reunion Tomake the computations less intensive with respect to com-puter resources ABC treatments were carried out using onerepresentative sample site for each genetic group per replicateanalysis In other words when a genetic group included morethan two sample sites we considered the two most differen-tiated sample sites (ie with the highest FST values in supplementary table S3 Supplementary Material online) as repre-sentative of the group CN-Shi and JP-Tok for the group Asiaall three sample sites for the genetically heterogeneousWestern US group US-NC and US-Wis for the group easternUS GE-Dos and IT-Tre for the group Europe Each of thesample sites retained (ie the representative sample sites) wasincluded alternatively in the sample set considered for ABC

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

991Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 4: Deciphering the Routes of invasion of Drosophila suzukii

range was first recorded in the Hawaiian archipelago in theearly 1980rsquos (Kaneshiro 1983) but no further spread was ob-served until the late 2000rsquos when D suzukii was almost syn-chronously recorded in the southwest of the USA andsouthern Europe (Hauser 2011 Calabria et al 2012) By2015 the species had spread throughout most of theNorth-American and European continents and reached thesouthern hemisphere in Brazil (Depra et al 2014) Drosophilasuzukii represents a particularly appealing biological model forgaining further insights into the evolutionary factors associ-ated with invasion success As a closely related species of themodel insect Drosophila melanogaster its short generationtime and viability in laboratory conditions facilitate experi-mental approaches to study evolutionary events

ABC-based analyses of DNA sequences data obtained forsix X-linked gene fragments suggest that North American andEuropean populations represent separate invasion events(Adrion et al 2014) The sources of these invasions and po-tential admixture among different regions remain unclearThe wide native and introduced distribution of D suzukiiimply a potentially large number of possible sources andhence a large number of possible introduction scenarios Byfirst grouping sample size into genetic clusters the number ofpopulations treated independently can be reduced some-what (Lombaert et al 2014) but not fully Additionally thetemporal proximity of invasions in the US and Europe makesit difficult to a priori propose certain introduction scenariosover others Together these features of D suzukiirsquos invasionmean that a large number of scenarios will need to be com-pared dramatically increasing computational requirementsfor traditional ABC methods and hence making ABC-RFmethods particularly appealing

New ApproachesThe formal description of a finite set of possible introductionscenarios as models lays the foundation for ABC analyses andis outlined in Estoup and Guillemaud (2010) Choosingamong the formulated models is the central statistical prob-lem to be overcome when reconstructing routes of invasionfrom molecular data Both theoretical arguments and simu-lation experiments indicate that approximate posterior prob-abilities estimated from ABC analyses for the modeledintroduction scenarios can be inaccurate even though themodels being compared can still be ranked appropriately us-ing numerical approximation (Robert et al 2011) To over-come this problem Pudlo et al (2016) developed a novelapproach based on a machine learning tool named ldquorandomforestsrdquo (RF Breiman 2001) which selects among the complexintroduction models covered by ABC algorithms This ap-proach enables efficient discrimination among models andestimation of posterior probability of the best model whilebeing computationally less intensive

We invite readers to consult Pudlo et al (2016) to ac-cess to detailed statistical descriptions and testing of theABC random forest (ABC-RF) method Briefly randomforest (RF) is an algorithm that learns from a databasehow to predict a variable called the output from a possi-bly large set of covariates In our context the database is

the reference table which includes a given number ofdatasets that have been simulated for different scenariosusing parameter values drawn from prior distributionseach dataset being summarized with a pool of statistics(ie the covariates) RF aggregates the predictions of acollection of classification or regression trees (dependingwhether the output is categorical here the scenario iden-tity or quantitative here the posterior probability of thebest scenario) Each tree is built using the informationprovided by a bootstrap sample of the database and man-ages to capture one part of the dependency between theoutput and the covariates Based on these trees which areseparately poor to predict the output an ensemble learn-ing technique such as RF aggregates their predictions toincrease predictive performances to a high level of accu-racy in favorable contexts (Breiman 2001) RF is currentlyconsidered as one of the major state-of-art algorithm forclassification or regression

In ABC random forest (ABC-RF) Pudlo et al (2016) makestwo important advances regarding the use of summary sta-tistics and the identification of the most probable model Forthe summary statistics given a pool of different metrics avail-able ABC-RF extracts the maximum of information from theentire set of proposed statistics This avoids the arbitrarychoice of a subset of statistics which is often applied inABC analyses and also avoids what in the statistical sciencesis called ldquothe curse of dimensionalityrdquo [see Blum et al (2013)for a comparative review of dimension reduction methods inABC] With respect to identifying the most probable modelABC-RF uses a classification vote system rather than the pos-terior probabilities traditionally used in ABC analysis The firstoutcome of a ABC-RF analysis applied to a given target data-set is hence a classification vote for each competing modelwhich represents the number of times a model is selected in aforest of n classification trees The model with the highestnumber of classification votes corresponds to the model bestsuited to the target dataset among the set of competingmodels As a byproduct this step also provides a measureof the classification error called the prior error rate The priorerror rate is calculated as the probability of choosing a wrongmodel when drawing model index and parameter values intopriors The second outcome of ABC-RF is an estimation of theposterior probability of the best model that has been selectedusing a secondary random forest that regresses the modelselection error of the first-step random forest over the avail-able summary statistics

Pudlo et al (2016) shows that as compared to previousABC methods ABC-RF offers at least four advantages (i) itsignificantly reduces the model classification error as mea-sured by the prior error rate (ii) it is robust to the numberand choice of summary statistics as RF can handle manysuperfluous andor strongly correlated statistics with no im-pact on the performance of the method [see Blum et al(2013) for alternative methods of dimension reduction] (iii)the computing effort is considerably reduced as RF requires amuch smaller reference table compared with alternativemethods (ie a few thousands of simulated datasets versushundreds of thousands to millions of simulations per

Fraimout et al doi101093molbevmsx050 MBE

982Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

compared model) and (iv) it provides more reliable estima-tion of posterior probability of the selected model (ie themodel that best fit the observed dataset)

To the best of our knowledge the present study is the firstto use the ABC-RF method for an ensemble of model choiceanalyses with different levels of complexity (ie various num-ber of compared models including various and sometimeslarge number of parameters sampled populations and hencenumber of summary statistics) on real multi-locus microsat-ellite datasets Specifically we analyze molecular data at 25microsatellite loci from 23 D suzukii sample sites locatedacross most of its native and introduced range We con-ducted our population genetics study in five steps (1) Wedefined focal genetic groups and used this information toformalize 11 sets of competing introduction scenarios Wethen choose among them in a sequential ABC-RF analysis (2)We compared the performance of ABC-RF to a more stan-dard ABC method for a subset of analyses [ABC-LDA Estoupet al (2012) see ldquoMaterials and Methodsrdquo section] Wechoose ABC-LDA in our comparative study because it is con-sidered to be one of the most efficient standard ABC methodsto discriminate among models (Pudlo et al 2016) and it iswidely used in population genetics studies (Lombaert et al2014 and reference therein) (3) We refined the inferredworldwide invasion scenario by assessing the most likely or-igins of the primary introductions from the native area (4)We estimated the posterior distributions under the final in-vasion scenario of demographic parameters associated withbottleneck and genetic admixture events (5) Finally we per-formed model-posterior checking analyses to insure that thefinal worldwide invasion scenario displayed a reasonablematch to the observed dataset

Results

Origins of Invasive Genetic GroupsPrior to conducting any type of ABC analysis focal geneticgroups must be defined by characterizing genetic diversityand structure within and between all genotyped sample sitesusing traditional statistics and clustering methods This isdescribed in details in supplementary appendix S1Supplementary Material online We defined seven main ge-netic groups from our initial set of 23 sample sites (supplementary table S1 Supplementary Material online) Asia (sam-ple sites CN-Lan CN-Lia CN-Nin CN-Shi JP-Tok and JP-Sap)Hawaii (sample site US-Haw) western US (sample sites US-Wat US-Sok and US-SD) eastern US (sample sites US-ColUS-NC US-Wis and US-Gen) Europe (sample sites GE-DosFR-Par FR-Bor FR-Mon SW-Del SP-Bar and IT-Tre) Brazil(BR-PA) and La Reunion (FR-Reu)

This genetic grouping along with historical and geograph-ical information (supplementary table S1 SupplementaryMaterial online) allowed us to formalize 11 nested sets ofcompeting invasion scenarios that we analyzed sequentiallyusing ABC model choice methodologies The date thatD suzukii was first recorded in a location was used to helpformulate the scenarios For example D suzukii was first re-corded in the continental US from California (US-Wat) in

2008 and was not recorded in the central state of Colorado(US-Col) until 2012 US-Wat could therefore serve as a sourcefor US-Col but not vice versa We present the main analysesin table 1 In supplementary table S2 Supplementary Materialonline a more detailed description of each competing sce-narios considered for each analysis is provided The 11 se-quential ABC analyses permit step-by-step reconstructionof the introduction history of D suzukii worldwide Resultsfor each analysis based on prior set 1 (which used boundeduniform distributions of parameters see ldquoMaterials andMethodsrdquo section) and a single set of sample sites represen-tative of the genetic groups defined above are given in table 2For all 11 analyses similar ABC-RF results were obtained usingthe prior set 2 (which used more peaked distributions supplementary table S3 Supplementary Material online) andconsidering various sets of representative sample sites (supplementary table S4 Supplementary Material online)

By choosing among different competing scenarios eachanalysis identifies the most probable source(s) for a givenintroduced population The results come in the form of dif-fering levels of support for the competing introduction mod-els It is thus worth stressing here that the identification of themost probable source population X (from among a finite setof possible source populations) does not necessarily meanthat the introduced population Y originated from exactlylocation X but rather that the most probable origin of thefounders of the invasive population sampled at site Y is apopulation genetically similar to the source population sam-pled at site X However for sake of concision we will usehereafter the simplified terminology that the most probableorigin of Y is X

After the early invasion of Hawaii (1980) the first recordedintroduction was to the genetically heterogeneous westernUS and thus that is where our main analyses start (analyses1andash1d table 2) We considered the three western US samplesites in separate analyses and we found in each case that thebest scenario included genetic admixture between Asia andHawaii as sources (table 2 analysis 1andash1c mean posteriorprobability of the best scenario Pfrac14 0999) We subsequentlytested if this common admixture pattern resulted from asingle or multiple introduction events from Asia andorHawaii including the possibility of local colonization eventsamong western US sites (analysis 1d) The best invasion sce-nario for the western US included an initial admixture eventwith Asia and Hawaii as probable sources in northernCalifornia (site US-Wat admixture event A1 in fig 1) followedby (i) a local colonization southward into San Diego (US-SD)and northward into Oregon state (US-Sok) and (ii) a second-ary introduction event from Hawaii to Oregon (US-Sok) (A2event in fig 1 Pfrac14 0690) This result is supported by variousgenetic clustering results (fig 1 supplementary figs S1 and S2Supplementary Material online) as well as raw F-statisticsvalues which indicate high differentiation level between US-Sok and other continental US sample sites and a substantiallylower FST between US-Sok and Hawaii (supplementary tableS5 Supplementary Material online) Once the invasion historywas resolved for the western US group we inferred the mostprobable source(s) of the invasive eastern US and European

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

983Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

genetic groups using two independent analyses (analyses 2aand 2b table 1 and supplementary table S2 SupplementaryMaterial online) We found evidence for a single origin for theeastern US group corresponding to an intra-continentalspread from the San Diego area (Pfrac14 0779 table 2)

The most probable origin for Europe was an independentintroduction from the Asian native range (Pfrac14 0716 table 2)The European and North American invasions occurred nearlysimultaneously and thus we also explored gene flow betweenthe two regions Europe was never selected as a source for theeastern US (analysis 3a Pfrac14 0744) no matter which sets ofsample sites were considered In contrast we found someevidence of asymmetrical gene flow from the eastern USinto at least some European locations When consideringthe northern European sample site from Germany (GE-Dos) as a target we found that the best scenario includedan admixed origin with Asia and the eastern US (supplementary table S4 Supplementary Material online analysis 3bPfrac14 0488 and Pfrac14 0501 for prior set 1 and 2 respectively)In contrast the best scenario did not include such admixtureevent when considering the southern European sample sitefrom Italy (IT-Tre) as a target (table 2 analysis 3b Pfrac14 0510)We therefore replicated analysis 3b on all possible combina-tions of European and eastern US sample sites to assess therobustness of this result and evaluate the possibility of a geo-graphic pattern for admixture events For each of theEuropean sample sites figure 2 summarizes which replicateanalyses selected a scenario including an admixture betweeneastern US and Asia or a scenario of a single introductionfrom Asia (and see supplementary fig S3 SupplementaryMaterial online for results using an alternative representativesample site for the native range) Results tended to indicate

an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north-south gradient More specifically eastern US alleles were pre-sent in northern Europe and absent or at least not detectedby our model choice method in southern Europe We thenfurther tested whether the observed admixture in the northcorresponded to the mixing of eastern US genes with south-ern European genes originating from the initial introductionevent from Asia versus the mixing of eastern US genes withAsian genes introduced through a separate secondary intro-duction event from Asia in northern Europe (analysis 4) Thebest scenario was clearly that of an admixture between east-ern US genes and southern European genes which them-selves had a probable origin as part of the initialintroduction event from Asia (Pfrac14 0998 A3 event in fig 1)

Finally we found that the sampled population from Braziloriginated from North America with genetic admixture be-tween D suzukii individuals from the southwest and easternregions of the US (analysis 5a Pfrac14 0631 A4 event in fig 1)Regarding the most recent invasive population from LaReunion we found that this island population originatedfrom Europe with admixture between individuals fromnorthern and southern Europe (analysis 5b Pfrac14 0500 A5event in fig 1)

Comparison of Model Choice Analyses Using ABC-RFversus ABC-LDAFor all 11 analyses ABC-LDA performed well when using largereference tables (ie 500000 simulations per scenario table 2and supplementary table S3 Supplementary Material online)Indeed the prior error rates for ABC-LDA with large referencetables were smaller than the prior error rates for ABC-RF with

Table 1 Formulation of the Model Choice Analyses that were Carried Out Successively to Reconstruct Invasion Routes of D suzukii Using ABC

ModelChoiceAnalysis

Number ofComparedScenarios

Tackled Question Potential source genetic group Focal populations

1a 3 What are the origins of western US populations Asia Hawaii US-Wat1b US-Sok1c US-SD1d 7 What are the relations among western US popu-

lations and their extra-continental sourcesAsia Hawaii US-Wat US-Sok

US-SDUS-Wat thorn US-Sokthorn US-SD

2a 6 What are the origins of eastern US populations Asia Hawaii western US eastern US2b 6 What are the origins of European populations Asia Hawaii western US Europe3a 10 Is there asymmetrical gene flow from Europe to

eastern USAsia Hawaii western US Europe eastern US

3b 10 Is there asymmetrical gene flow from eastern USto Europe

Asia Hawaii western US eastern US Europe

4 2 Does the admixture with eastern US genes innorthern Europe result from a secondary Asianintroduction

Asia Hawaii western US eastern USsouthern Europe

northern Europe

5a 21 What are the origins of the Brazilian population Asia Hawaii western US eastern USsouthern Europe northern Europe

Brazil

5b 21 What are the origins of La Reunion population Asia Hawaii western US eastern USsouthern Europe northern Europe

La Reunion

NOTEmdash Each numbered analysis is a comparison of a certain number of scenarios by ABC model choice We summarize each analysis by stating the question that it addressed Adetailed verbal description of each compared scenario is given in supplementary table S2 Supplementary Material online The 11 analyses are nested in the sense that eachsubsequent analysis use the result obtained from the previous one For example in Analysis 2a ldquoWhat are the origins of eastern US populationsrdquo capitalizes on the historyinferred for western US populations from analysis 1d ldquoPotential source genetic grouprdquo indicates all the potential source populations considered in the analyses for which onewants to identify the origin of the focal population (ie the target)

Fraimout et al doi101093molbevmsx050 MBE

984Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Tab

le2

Res

ult

so

fM

od

elC

ho

ice

An

alys

esU

sin

gA

BC

-RF

and

AB

C-L

DA

Pri

or

erro

rra

teP

ost

erio

rP

rob

abili

tyo

fth

eb

est

Mo

del

Ori

gin

of

the

foca

lp

op

ula

tio

nu

sin

gei

ther

AB

C-R

Fo

rA

BC

-LD

A(i

e

bes

tm

od

el)

An

alys

isT

ota

lNu

mb

ero

fSc

enar

ios

Nu

mb

ero

fSu

mS

tats

A

BC

-RF

(sd

)A

BC

-LD

A(l

arge

refe

ren

ceta

ble

)

AB

C-L

DA

(sm

all

refe

ren

ceta

ble

)

AB

C-R

F(s

d)

AB

C-L

DA

(lar

gere

fere

nce

tab

le)

[CI]

1a3

390

100

(60

002)

007

50

085

100

0(6

000

1)1

000

[10

001

000

]A

siathorn

Haw

aii

1b3

009

4(6

000

1)0

070

009

10

989

(60

005)

099

9[0

999

09

99]

1c3

009

6(6

000

1)0

079

008

70

998

(60

002)

099

9[0

999

09

99]

1d7

130

032

7(6

000

1)0

274

036

80

690

(60

018)

074

1[0

714

07

67]

Asi

athorn

Haw

aii

2a6

130

023

2(6

000

1)0

203

024

10

779

(60

019)

078

8[0

764

08

13]

Wes

tern

US

2b6

130

023

2(6

000

1)0

179

039

30

716

(60

013)

060

4[0

592

06

16]

Asi

a

3a10

204

032

8(6

000

1)0

265

036

40

744

(60

020)

084

4[0

820

08

68]

Wes

tern

US

3b10

204

040

8(6

000

1)0

358

042

30

510

(60

014)

040

9[0

391

04

26]

Asi

a

42

301

012

1(6

000

1)0

111

021

21

000

(60

009)

099

9[0

999

09

99]

Sou

ther

nEu

rop

ethorn

east

ern

US

5a21

424

029

4(6

000

1)0

230

042

30

631

(60

034)

081

2[0

788

08

37]

Wes

tern

USthorn

east

ern

US

5b21

424

029

8(6

000

1)0

242

042

80

500

(60

027)

029

0[0

256

03

23]

No

rth

ern

Euro

pethorn

sou

ther

nEu

rop

e

NO

TEmdash

All

mo

del

cho

ice

anal

yses

wer

eca

rrie

do

ut

usi

ng

the

pri

or

set

1(s

up

ple

men

tary

tab

leS8

Su

pp

lem

enta

ryM

ater

ialo

nlin

e)an

da

sin

gle

set

ofs

amp

lesi

tes

rep

rese

nta

tive

oft

he

pre

-defi

ned

gen

etic

gro

up

sSa

mp

lesi

teJP

-To

kfo

rth

en

ativ

eA

sian

gro

up

US-

Haw

fort

he

Haw

aiia

ngr

ou

pU

S-W

atU

S-So

kan

dU

S-SD

fort

he

wes

tern

US

gro

up

US-

NC

fort

he

east

ern

US

gro

up

IT

-Tre

fort

he

sou

ther

nEu

rop

egr

ou

pG

E-D

os

fort

he

no

rth

ern

Euro

pe

gro

up

BR

-PA

fort

he

Bra

zilg

rou

pa

nd

FR-

Reu

for

the

LaR

eun

ion

gro

up

(fig

1)D

atas

ets

wer

esu

mm

ariz

edu

sin

gth

ew

ho

lese

to

fsu

mm

ary

stat

isti

csp

rop

ose

db

yD

IYA

BC

(Co

rnu

etet

al2

014)

Th

eto

taln

um

ber

ofs

um

mar

yst

atis

tics

(Nu

mb

ero

fSu

mS

tats

)as

wel

las

the

tota

lnu

mb

ero

fco

mp

ared

scen

ario

s(T

ota

lnu

mb

ero

fsce

nar

ios)

are

ind

icat

edfo

reac

han

alys

isP

rio

rerr

orr

ates

and

po

ster

iorp

rob

abili

ties

oft

he

bes

tmo

del

cho

sen

usi

ng

AB

C-R

Fw

ere

aver

aged

ove

r10

rep

licat

ean

alys

esA

BC

-RF

and

AB

C-L

DA

trea

tmen

tsyi

eld

edth

esa

me

bes

tm

od

elch

oic

efo

ral

lan

alys

esan

dis

den

ote

din

the

colu

mn

ldquoOri

gin

oft

he

foca

lpo

pu

lati

on

usi

ng

eith

erA

BC

-RF

or

AB

C-L

DA

(ie

b

est

mo

del

)rdquoA

dm

ixtu

reb

etw

een

two

sou

rce

po

pu

lati

on

sar

ere

pre

sen

ted

by

aldquothorn

rdquosi

gnA

BC

-LD

Ap

ost

erio

rpro

bab

iliti

eso

fth

eb

est

mo

del

sw

ere

esti

mat

edu

sin

gldquol

arge

refe

ren

ceta

ble

srdquow

ith

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioA

BC

-LD

Ap

rio

rerr

orr

ates

wer

eco

mp

ute

du

sin

gre

fere

nce

tab

les

oft

wo

diff

eren

tsi

zes

ldquolar

gere

fere

nce

tab

lerdquo(

ie

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

io)

and

ldquosm

allr

efer

ence

tab

lerdquo

(ie

10

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioas

for

AB

C-R

Fan

alys

es)

SD

sta

nd

sfo

rst

and

ard

dev

iati

on

ove

r10

rep

licat

ean

alys

esan

dC

Ifo

r95

co

nfi

den

cein

terv

alco

mp

ute

dfo

llow

ing

Co

rnu

etet

al(

2008

)

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

985Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

small reference tables (paired t-test t10frac1427 Pfrac14 002)However ABC-RF performed better than ABC-LDA whenusing small reference tables (only 10000 simulations per sce-nario) having lower prior error rates (paired t-test t10frac14116Plt 00001 table 2 and supplementary table S3Supplementary Material online) The difference in prior errorrate was particularly marked for the complex analyses iethose in which many introduction scenarios were comparedand many summary statistics were computed For instancefor the analyses 5a and 5b (21 compared scenarios and 424summary statistics computed) prior error rates were 0217and 0221 for ABC-RF versus 0417 and 0425 for ABC-LDA

Regarding computational effort we found that for a givenobserved dataset an ABC-RF treatment to select the bestscenario and compute its posterior probability required a

ca 50 times shorter computational duration than when pro-cessing a ABC-LDA treatment with a traditional referencetable of large size (ie 500000 simulated datasets per sce-nario) Moreover estimation of prior error rates with ABC-RF took only a few additional minutes of computation timefor 10000 pseudo-observed datasets whereas with ABC-LDAit lasted several hours to several days (depending on theanalysis processed) for 500 pseudo-observed datasets

The same invasion scenario had the highest probability forall 11 analyses carried out using ABC-RF and ABC-LDA withlarge number of simulated datasets (table 2 and supplementary table S3 Supplementary Material online) Posterior prob-abilities of the best scenario estimated using ABC-RF were notsystematically higher (or lower) than those using ABC-LDAwith large number of simulated datasets In particular there

FIG 1 Worldwide invasion scenario of D suzukii inferred from microsatellite data and date of first observation Map and schematic showingsample sites and the invasion routes taken by D suzukii as reconstructed by ABC-RF (Pudlo et al 2016) on a total of 685 individuals from 23geographic locations genotyped at 25 microsatellite loci (see results and methods for details) The native range is in dark grey and the invasiverange is in light-gray (cf delimitation from fig 1 in Asplen et al 2015) The year in which D suzukii was first observed at each sample site is indicatedin italics The 23 geographical locations that were sampled are represented by circles (native range) and squares diamonds and triangles(introduced range) Squares indicate populations that experienced weak bottlenecks (ie median value of bottleneck severity lt 012 seemain text and table 3) diamonds indicate moderate bottlenecks (012 lt bottleneck severity lt 022) and triangles indicate strong bottlenecks(ie bottleneck severitygt 03) The colors of the symbol for the sample sites and arrows between them correspond to the different genetic groupsobtained using the clustering method BAPS (supplementary appendix S1 Supplementary Material online) The arrows indicate the most probableinvasion pathways A1ndashA5 indicate five separate admixture events between different sources O1ndashO3 indicate the most probable sources withinthe native range for the primary introduction events A1frac14 Hawaiithorn southeast China A2frac14Watsonville (western US)thorn Hawaii A3frac14 southernEurope thorn eastern US A4 frac14 western US thorn eastern US A5 frac14 southern Europe thorn northern Europe O1 frac14 Japan O2 frac14 southeast China O3 frac14northeast China

Fraimout et al doi101093molbevmsx050 MBE

986Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

was no clear trend of overestimation of posterior probabilitywhen using ABC-LDA with large number of simulated data-sets versus ABC-RF a potential bias suggested by Pudlo et al(2016) On the other hand we found evidence of instability inthe estimation of posterior probabilities and hence modelselection when using ABC-LDA with a small numbers of sim-ulated datasets in the reference table (ie 10000 simulationsper scenario as for ABC-RF) In this case the scenario with thehighest probability was different from that selected using ei-ther ABC-LDA with the large reference table or ABC-RF infour and two of the 11 analyses when using the prior sets 1and 2 respectively (results not shown)

Refining the Origins of the Primary IntroductionEvents from the Native AreaAdditional ABC-RF analyses were run to determine whichgeographical area in the native range was the most likelyorigin of each of the three primary introductions (ie inHawaii western US and Europe) The most probable originof the Hawaiian invasive population was Japan with a mean

posterior probability of 0959 (origin O1 in fig 1) ForWatsonville (western US sample site US-Wat) the mostprobable source of the primary introduction from Asia wassoutheast China (origin O2 in fig 1 Pfrac14 0860) Finally forEurope the most probable source of the primary introduc-tion from Asia was northeast China (origin O3 in fig 1Pfrac14 0855) Similar posterior probabilities were obtained usingthe prior set 1 (values presented above) and the prior set 2(results not shown) For inferences regarding Europe similarposterior probabilities were also obtained using various sets ofsample sites from Europe and the eastern US (results notshown)

Estimation of Parameter Distributions for AdmixtureRate and Bottleneck SeverityWe used the most probable worldwide invasion scenario (fig1) to estimate the posterior distributions of parameters re-lated to bottleneck severity associated to the foundation ofnew populations and genetic admixture rates between differ-entiated sources (ie A1ndashA5 events in fig 1)

FIG 2 Admixed or non-admixed origin of D suzukii in Europe inferred for each sample sites Potential source populations include (among others)the Asian native range (represented by the Japanese sample site JP-Tok) and the invasive genetic group from eastern US represented by one of thefour invasive sample sites collected in this area (fig 1) Four replicate independent ABC-RF treatments corresponding to the analysis 3b (table 2)were hence carried out for each targeted European sample site using one of the four eastern US sample site The treatments labeled 1 2 3 and 4 inthe pies of the figure have been carried out with the sample sites US-NC US-Wis US-Gen and US-Col respectively A pie quarter in blue indicatesthat the best scenario corresponds to a single introduction event from Asia A pie quarter in red indicates that the best scenario corresponds to anadmixture event between Asia and eastern US Dates in italic correspond to the dates of first record of the European sample sites

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

987Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

The posterior distributions parameters relating to bottle-necks and admixture substantially differed from the priorsindicating that genetic data were informative for such param-eters (tables 3 and 4 and see supplementary tables S6 and S7Supplementary Material online for results using an alternativeset of representative sample sites) Regarding bottleneck se-verity we found that bottlenecks tended to be more severefor populations founded by individuals originating from asource located on a different continent (extra-continentalorigin) than for populations founded by individuals originat-ing from a source located on the same continent (intra-con-tinental origin table 3 and fig 1) The bottleneck severity forpopulations founded by individuals corresponding to a com-bination of the two types of sources (extra- and intra-continental origins) was either weak or moderate The stron-gest bottleneck severity was found for the first invasive

population in Hawaii which showed the lowest level of ge-netic variation among all invasive populations Bottleneckseverity values were globally slightly stronger in Europeanthan in US populations suggesting a smaller number offounding individuals andor a slower demographic recoveryin European populations

Regarding admixture (table 4) we found that (i) for thefirst recorded invasive population in western US (ieWatsonville sample site US-Wat) the genetic contributionfrom China was substantially larger than that from Hawaii(median value of 0759) (ii) for the western US populationUS-Sok the secondary genetic contribution from Hawaii wassmall compared to that from Watsonville (median value of0237) and (iii) the sample sites from northern Europe (egGE-Dos in Germany) contained a rather low proportion ofgenes originating from the eastern US (median value of

Table 3 Bottleneck Severity in Invasive Populations of D suzukii

Prior 1 Prior 2Introduction Type Sample Site Mean Median Mode q5 q95 Mean Median Mode q5 q95

Extra continental US-Haw 0517 0500 0495 0326 0755 0490 0484 0477 0382 0615US-Wat 0181 0138 0114 0041 0448 0143 0127 0118 0059 0275IT-Tre 0268 0179 0171 0059 0837 0236 0189 0172 0101 0553BR-PA 0158 0126 0104 0020 0407 0208 0193 0172 0067 0399FR-Reu 0259 0215 0186 0051 0626 0245 0214 0190 0069 0534

Intra-continental US-SD 0135 0117 0106 0039 0283 0117 0104 0091 0053 0224US-NC 0089 0067 0061 0015 0230 0111 0098 0090 0046 0218

Extra thorn intracontinental

US-Sok 0116 0099 0088 0027 0250 0119 0106 0099 0052 0229GE-Dos 0189 0177 0155 0061 0356 0205 0189 0171 0088 0372

Prior values 0628 0199 NA 0021 1904 0517 0500 NA 0326 0754

NOTEmdash Extra-continental introductions correspond to a long distance introduction from a source located apart from the continent of the focal population intra-continentalintroduction corresponds to an introduction event from a source located on the same continent than the focal population and Extrathorn Intra continental introductioncorresponds to a combination of the two types of sources Mean median and mode estimates as well as bounds of 90 credibility intervals (q5 and q95) are indicated foreach bottleneck severity parameter We roughly classified the estimated bottleneck severity values into three classes (represented here by the three shades of gray) weak (iemedian value of bottleneck severitylt 012 in light gray) moderate (012lt bottleneck severitylt 022 in gray) and strong (ie bottleneck severitygt 03 in dark gray) The set ofsample sites used for the ABC estimations presented here include (i) for the native area Japan (JP-Tokthorn JP-Sap) South-East China (CN-Nin) and North-East China (CN-LanthornCN-Lia) and (ii) for the invaded range US-Wat US-Sok and US-SD for western US US-NC for eastern US IT-Tre for southern Europe GE-Dos for northern Europe BR-PAfor South-America (Brazil) and FR-Reu for La Reunion island Code names of the sample sites are the same as in fig 1 and supplementary table S1 Supplementary Materialonline in which bottleneck severity classes are also given for each sample site See supplementary table S6 Supplementary Material online for results on a different set ofrepresentative sample sites

Table 4 Posterior Distributions of Admixture Rates for the Five Admixture Events Inferred for the Final Worldwide Invasion Scenario Described infigure 1

Admixture Event Admixture Rate (gene fraction from pop x) Mean Median Mode q5 q95

Prior 1 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0759 0761 0774 0659 0854A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0237 0230 0227 0092 0400A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0286 0278 0266 0112 0481A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0467 0461 0442 0187 0754A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0388 0380 0426 0132 0670

Prior 2 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0761 0762 0766 0684 0833A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0224 0219 0199 0114 0350A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0312 0306 0284 0152 0487A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0422 0418 0429 0247 0613A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0370 0368 0356 0178 0569

NOTEmdash Admixture events are denoted as in figure 1 Each admixture rate parameter r points to the name of the admixed site and corresponds to the fraction of genesoriginating from the source site in parentheses (1 r genes originate from the other source site) Mean median and mode estimates as well as bounds of 90 credibilityintervals (q5 and q95) are indicated for each admixture parameter Estimations assuming the prior set 1 and the prior set 2 (supplementary table S8 Supplementary Materialonline) are provided The set of representative sample sites used for the ABC estimations presented here is the same than in table 3 Code names of the population sites are thesame as in fig 1 and supplementary table S1 Supplementary Material online See supplementary table S7 Supplementary Material online for results on a different set ofrepresentative sample sites

Fraimout et al doi101093molbevmsx050 MBE

988Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

0286) We found more balanced admixture rates for thepopulations from Brazil (BR-PA) and La Reunion (FR-Reu)but information on this parameter was less accurate in thesecases as indicated by large 90 credibility intervals

Model-Posterior CheckingLike any model-based methods ABC inferences do not revealthe ldquotruerdquo evolutionary history but allows choosing the bestamong a necessarily limited set of scenarios that have beencompared and to estimate posterior distributions of param-eters under this scenario How well the inferred scenario-posterior combination matches with the observed datasetremains to be evaluated using an ABC model-posterior check-ing analysis When applying such an analysis on the final in-vasion scenario detailed in figure 1 we found that whenconsidering the prior set 1 only 54 of the 1141 summarystatistics used as test quantities had low posterior predictiveP-values (ie 0002lt ppp-valueslt 5) Moreover none ofthose ppp-values values remained significant when correctingfor multiple comparisons (Benjamini and Hochberg 1995)Similar results were obtained when assuming the prior set 2and when considering different sets of representative samplesites in our analyses (results not shown) These findings showthat the final worldwide invasion scenario (fig 1 with associ-ated parameter posterior distributions) matches the observeddataset well In agreement with this the projections of thesimulated datasets on the principal component axes from thefinal model-posterior combination were well grouped andcentered on the target point corresponding to the observeddataset (supplementary fig S4 Supplementary Material on-line) Due to the modest size of the dataset (ie 25 microsat-ellite loci) however only situations of major inadequacy ofthe model-posterior combination to the observed dataset arelikely to be identified

DiscussionIn the present paper we decipher the routes taken byD suzukii in its invasion worldwide evaluating evidence ofbottlenecks and genetic admixture associated with differentintroductions and we quantify the efficiency of ABC-RF rel-ative to the more standard ABC-LDA method for choosingamong introduction scenarios

A Complex Worldwide Invasion HistoryPrior to this study and that of Adrion et al (2014) our un-derstanding of the worldwide introduction pathways ofD suzukii was based on historical and observational datawhich were incomplete and potentially misleading Our re-sults indicate three distinct introductions from the nativerangemdashto Hawaii the western side of North America andwestern Europe which accords well to findings of Adrionet al (2014) from a different set of markers (X-linked se-quence data) Both the western North American introduc-tions and at least some European populations show signs ofadmixture Hawaii and China both appear to have contrib-uted to the western North American introductions whichwas in turn the most probable source for the introductioninto eastern North America Similarly China and to a lesser

extent eastern North America both appear to have contrib-uted to the European introduction but mostly in northernEuropean areas with respect to the eastern North Americancontribution The almost simultaneous invasion of NorthAmerica and Europe from Asia could be due to increasedtrade between these areas facilitating transport of this speciesbetween regions Alternatively or in addition adaptation tohuman-altered habitats (in this case changes in agriculturalpractices) within the native range of the species could havepromoted invasion to similarly altered habitats worldwide(ie anthropogenicaly induced adaptation to invadeHufbauer et al 2012) However the two invaded continentswere most likely colonized by flies from distinct Chinese geo-graphic areas (fig 1) requiring that adaptation to agricultureoccurred concomitantly at several locations within the nativerange This is possible but not evolutionarily parsimoniousand further data would be required to test this explanationSubsequent introductions from the three primary founderlocations to other locations result in a complex history ofinvasion Both primary and secondary introductions showsigns of demographic bottlenecks and several invasive pop-ulations have multiples sources and thus experienced geneticadmixture between differentiated native or invasivepopulations

The genetic relationships between worldwide D suzukiipopulations define the genetic variation they harbor andmay continue to shape further spread of alleles among pop-ulations Gene flow among populations through continuousdispersal or more punctual admixture events permits thedissemination of allelic variants and new mutations withdramatic consequences for evolutionary trajectories(Lenormand 2002) We found evidence for asymmetric ge-netic admixture from North American towards (northern)European populations If this signal of admixture reflects re-current and on-going dispersal events rather than a singlepast secondary introduction event then variants that arisein North America would be more likely to spread to Europethan the reverse Discriminating between ongoing and pastdispersal events is tricky however especially for recent inva-sions (Benazzo et al 2015) If the occurrence of recurrent geneflow is confirmed then D suzukii populations from Europemay not lag behind American ones in evolutionary termsFrom an applied perspective if resistance to control practicesemerges in North America special efforts to stop the impor-tation of D suzukii to Europe would help prevent the evolu-tion of resistance in Europe and thus reduce damage to crops(Caprio and Tabashnik 1992)

Advances in Methods for Discriminating amongComplex ModelsTo the best of our knowledge the present study is the first touse the recently developed ABC-RF method (Pudlo et al2016) to test competing models characterized by differentlevels of complexity To compare ABC-RF to a more standardABC method we carried out a subset of model choice anal-yses using both ABC-RF and ABC-LDA (Estoup et al 2012)We ran ABC-LDA analyses in two ways with standard sizedlarge reference tables (500000 simulated datasets per

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

989Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

scenario) and with small reference tables (10000 simulateddatasets per scenario the same size as used for ABC-RF) Theperformance of ABC-LDA (in terms of prior error rate) inchoosing among introduction scenarios was slightly betterthan for ABC-RF when using large reference tablesFurthermore for ABC-LDA with large reference tables confi-dence in model choice (in terms of posterior probability) wasrelatively comparable to ABC-RF with small reference tablesThus when analyses are relatively simple or computationalresources are not limiting ABC-LDA can provide as robustinferences of invasion scenarios as ABC-RF

In contrast we found that ABC-RF out-performed ABC-LDA when using similarly small reference tables for both typesof analyses (10000 simulated datasets per scenario) ABC-LDA also was unstable choosing different scenarios in differ-ent replicate analyses when using small reference tablesThese empirical findings correspond well with those ofPudlo et al (2016) using simple models for which the trueposterior probabilities could be calculated Pudlo et al (2016)demonstrated that ABC-RF provided more reliable posteriorprobability of the best (true) scenario than standard ABCmethods when using the same (small) number of simulateddatasets in the reference table (see supplementary fig S3 inPudlo et al 2016) In our work the difference in performancebetween the two methods was most pronounced for com-plex analyses (eg when comparing more than 10 scenariosusing more than 100 summary statistics) Thus when analysesare not simple or computational resources are finite ABC-RFprovides more robust inferences of invasion scenarios thanABC-LDA

The lower computation cost of ABC-RF was also evidentFor a given observed dataset selecting the best supportedscenario and computing its posterior probability was ca 50times faster than when processing an ABC-LDA treatmentwith a traditional reference table of large size Moreover es-timation of prior error rates with ABC-RF took only a fewadditional minutes of computation time whereas it lastedseveral hours to several days with ABC-LDA The conse-quence of low computational costs is not simply in computertime Rather the efficiency of ABC-RF analyses allowed us torun replicate analyses on various sample sets even for themost complex D suzukii invasion scenarios This replication iscritical in evaluating the robustness of our statisticalinferences

We found that at least for some ABC-RF analyses theposterior probability of the best scenario and the global sta-tistical power to choose among alternative invasion scenarios(as measured by the prior error rate) were relatively low Forinstance in three of the 11 analyses (analyses 3b 5a and 5b)the best scenarios had relatively low probabilities (ie rangingfrom 0500 to 0630) and relatively high prior error rates (ieranging from 0300 to 0400) Although replicate analyses car-ried out on different sample sets and prior sets pointed to thesame best scenario for these three analyses such low proba-bility values and high prior error rates suggest a moderatelevel of confidence in the choice of model in these casesSimilarly we found that the scenario choice leading to theconclusion of the presence versus absence of admixed genes

from the eastern US in various European sample sites reliedon probability values that were sometimes as low as 0500Moreover in a minority of cases the best scenario was differ-ent depending on the sample site considered as representa-tive of the eastern US or the native area The observed patternof an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north tosouth gradient should hence be taken cautiously The anal-yses of additional European sample sites are needed to clarifythis issue The above inferential uncertainties also most likelyfind their sources in the relatively small number of markersgenotyped relative to the number complexity and similarityof some of the competing models Such results indicate thatthere is room for improving the robustness of our inferencesat least for a subset of our analyses

Bottlenecks and Genetic AdmixtureWe found strong evidence that bottlenecks and admixtureboth occurred during the course of the invasion of D suzukiiA first sign of bottlenecks is a reduction in neutral geneticdiversity We found that all invasive D suzukii populationswere characterized by significantly lower genetic variationthan native ones with a loss of 64 (US-Col) to 233(US-Haw) of heterozygosity and of 273 (US-Wat) to542 (US-Haw) of allelic diversity respectively (supplementary fig S5 Supplementary Material online) Adrion et al(2014) find a similar significant loss of diversity in their datasetfocused on gene sequences Dlugosch and Parker (2008) andUller and Leimu (2011) reviewed studies of neutral geneticdiversity in a large number of species of animals plants andfungi and compared nuclear molecular diversity within intro-duced and source populations Overall they found that a lossof variation was the most frequent feature in invasive popu-lations However reductions in genetic variation were on av-erage modest (eg average loss of 187 and 155 ofheterozygosity and allelic diversity respectively Dlugoschand Parker 2008) The reductions observed in our study ininvasive D suzukii populations were thus in the same rangefor heterozygosity but higher for allelic diversity (cf averageloss of 111 and 345 of heterozygosity and allelic diversityrespectively) A larger loss of allelic diversity than heterozy-gosity after a bottleneck is nevertheless expected by theory(Nei et al 1975)

The type of introduction pathway explains at least partlythe severity of the reduction in genetic diversity Bottleneckstended to be less severe for populations founded by individ-uals from the same continent than for populations foundedby individuals originating from a different continent A simpleexplanation for such pattern is that due to greater proximitypopulations originating from the same continent are likely tobe initially founded by a larger numbers of individuals andalso to experience recurrent gene flow Populations foundedby individuals from both the same and different continentsexperienced weak or moderate reductions in diversity sug-gesting that multiple introduction pathways restore diversity

In agreement with previous studies on invasions on simi-larly large geographical scales (eg Lombaert et al 2014 andreference therein) we found that admixture events were

Fraimout et al doi101093molbevmsx050 MBE

990Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

frequent in the worldwide invasion history of D suzukii Weidentified at least five admixture events for which sourceswere native andor invasive populations (ie A1ndashA5 eventsin fig 1) The genetic contributions of the sources were un-balanced except for the populations from Brazil and LaReunion but genetic information regarding admixture wasconsiderably less accurate in the latter cases which involvedweakly differentiated source populations

We have provided here the information necessary to eval-uate whether bottlenecks and admixture have influencedoutcomes in this invasion Quantitative genetics studies inthe lab focusing on the key source bottlenecked and ad-mixed populations could now examine the fitness and adap-tive potential of those groups (Lavergne and Molofsky 2007Facon et al 2008 Turgeon et al 2011)

Conclusions and PerspectivesUnderstanding invasion pathways provides key informationfor understanding the role of evolutionary processes in bio-logical invasions For example only does our research revealsources of invasive populations for further relevant compar-isons using quantitative genetics studies in the labAdditionally we found that the invasion of Europe by Dsuzukii was distinct from that of North America with limitedand asymmetrical gene flow between these two main invadedareas This situation provides the opportunity to evaluateevolutionary trajectories in replicate into two separate tem-perate climate regions similarly to research by Gilchrist et al(2001 2004) on the pace of clinal evolution in the invasivefruit fly Drosophila subobscura There are also distinct invasionpathways for areas with warmer climates (ie Hawaii LaReunion and Brazil) that will make interesting comparisons

Retracing invasion routes and making inferences aboutdemographic processes is only possible if there is adequatepolymorphism within populations and significant genetic dif-ferentiation among them As is evident here microsatellitedata can provide enough variation to reveal important path-ways We concur with Adrion et al (2014) however thatgenome-wide data (ie next generation sequencingapproaches) will be tremendously powerful in further dis-criminating among complex invasion scenarios (see alsoCristescu 2015 Estoup et al 2016) Because of the reducedcomputational resources demanded by ABC-RF this methodwill be particularly useful for analysis of massive single nucle-otide polymorphism datasets (Pudlo et al 2016) In additionto the selectively neutral demographic inference leading tothe reconstruction of routes of invasion population (andquantitative) next generation sequencing approaches arequite promising for studying the evolution of phenotypictraits in natural populations (Wray 2013 Gautier 2015)Specifically they can be used to better understand the geneticarchitecture of traits underlying invasion success (Bock et al2015) Drosophila suzukii is a good species for such researchgiven (i) its short generation time and viability in laboratoryconditions which facilitate experimental approaches to studyquantitative traits of interest (Asplen et al 2015) and (ii) theavailability of annotated genome assemblies for this species

(Chiu et al 2013 Ometto et al 2013) along with the hugeamount of genomic resources available in its close relativespecies D melanogaster (Groen and Whiteman 2016)

Materials and Methods

Sampling and GenotypingAdult D suzukii were sampled from the field at a total of 23localities (hereafter termed sample sites) distributed through-out most of the native and invasive range of the species(supplementary table S1 Supplementary Material onlineand fig 1) Samples were collected between 2013 and 2015using baited traps and sweep nets and stored in ethanolNative Asian samples consisted of a total of six sample sitesincluding four Chinese and two Japanese localities Samplesfrom the invasive range were collected in Hawaii (1 samplesite) Continental US (7 sites) Europe (7 sites) Brazil (1 site)and in the French island of La Reunion (1 site) For eachsample site 15ndash44 adult flies were genotyped at 25 of the28 microsatellite loci described in Fraimout et al (2015) re-sulting in a total of 685 genotyped individuals Three micro-satellite loci from Fraimout et al (2015) (ie loci DS31 DS42and DS45) were not included due to the presence of nullalleles at frequenciesgt 10 in some sample sites (resultsnot shown) DNA extraction and PCR amplification as wellas allele scoring were performed following the methodologydetailed in Fraimout et al (2015)

General Framework for Defining Focal PopulationGroups and Sets of Scenarios to Be ComparedUsing ABCOne of the many challenges of ABC analyses is to optimallydefine sets of sample sites to be processed chronologically aspotential sources and targets in a way that do not hamper thecomputational effort As stated in Estoup and Guillemaud(2010) and Lombaert et al (2014) the number and complex-ity of competing scenarios is proportional to the number ofgenetic groups to be accounted in ABC analyses FollowingLombaert et al (2014) we used the results obtained fromthree genetic clustering methods along with historical andgeographical information to define genetic groups that couldinclude several sample sites (supplementary appendix S1Supplementary Material online) We defined seven main ge-netic groups from our set of 23 sample sites Asia Hawaiiwestern US eastern US Europe Brazil and La Reunion Tomake the computations less intensive with respect to com-puter resources ABC treatments were carried out using onerepresentative sample site for each genetic group per replicateanalysis In other words when a genetic group included morethan two sample sites we considered the two most differen-tiated sample sites (ie with the highest FST values in supplementary table S3 Supplementary Material online) as repre-sentative of the group CN-Shi and JP-Tok for the group Asiaall three sample sites for the genetically heterogeneousWestern US group US-NC and US-Wis for the group easternUS GE-Dos and IT-Tre for the group Europe Each of thesample sites retained (ie the representative sample sites) wasincluded alternatively in the sample set considered for ABC

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

991Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 5: Deciphering the Routes of invasion of Drosophila suzukii

compared model) and (iv) it provides more reliable estima-tion of posterior probability of the selected model (ie themodel that best fit the observed dataset)

To the best of our knowledge the present study is the firstto use the ABC-RF method for an ensemble of model choiceanalyses with different levels of complexity (ie various num-ber of compared models including various and sometimeslarge number of parameters sampled populations and hencenumber of summary statistics) on real multi-locus microsat-ellite datasets Specifically we analyze molecular data at 25microsatellite loci from 23 D suzukii sample sites locatedacross most of its native and introduced range We con-ducted our population genetics study in five steps (1) Wedefined focal genetic groups and used this information toformalize 11 sets of competing introduction scenarios Wethen choose among them in a sequential ABC-RF analysis (2)We compared the performance of ABC-RF to a more stan-dard ABC method for a subset of analyses [ABC-LDA Estoupet al (2012) see ldquoMaterials and Methodsrdquo section] Wechoose ABC-LDA in our comparative study because it is con-sidered to be one of the most efficient standard ABC methodsto discriminate among models (Pudlo et al 2016) and it iswidely used in population genetics studies (Lombaert et al2014 and reference therein) (3) We refined the inferredworldwide invasion scenario by assessing the most likely or-igins of the primary introductions from the native area (4)We estimated the posterior distributions under the final in-vasion scenario of demographic parameters associated withbottleneck and genetic admixture events (5) Finally we per-formed model-posterior checking analyses to insure that thefinal worldwide invasion scenario displayed a reasonablematch to the observed dataset

Results

Origins of Invasive Genetic GroupsPrior to conducting any type of ABC analysis focal geneticgroups must be defined by characterizing genetic diversityand structure within and between all genotyped sample sitesusing traditional statistics and clustering methods This isdescribed in details in supplementary appendix S1Supplementary Material online We defined seven main ge-netic groups from our initial set of 23 sample sites (supplementary table S1 Supplementary Material online) Asia (sam-ple sites CN-Lan CN-Lia CN-Nin CN-Shi JP-Tok and JP-Sap)Hawaii (sample site US-Haw) western US (sample sites US-Wat US-Sok and US-SD) eastern US (sample sites US-ColUS-NC US-Wis and US-Gen) Europe (sample sites GE-DosFR-Par FR-Bor FR-Mon SW-Del SP-Bar and IT-Tre) Brazil(BR-PA) and La Reunion (FR-Reu)

This genetic grouping along with historical and geograph-ical information (supplementary table S1 SupplementaryMaterial online) allowed us to formalize 11 nested sets ofcompeting invasion scenarios that we analyzed sequentiallyusing ABC model choice methodologies The date thatD suzukii was first recorded in a location was used to helpformulate the scenarios For example D suzukii was first re-corded in the continental US from California (US-Wat) in

2008 and was not recorded in the central state of Colorado(US-Col) until 2012 US-Wat could therefore serve as a sourcefor US-Col but not vice versa We present the main analysesin table 1 In supplementary table S2 Supplementary Materialonline a more detailed description of each competing sce-narios considered for each analysis is provided The 11 se-quential ABC analyses permit step-by-step reconstructionof the introduction history of D suzukii worldwide Resultsfor each analysis based on prior set 1 (which used boundeduniform distributions of parameters see ldquoMaterials andMethodsrdquo section) and a single set of sample sites represen-tative of the genetic groups defined above are given in table 2For all 11 analyses similar ABC-RF results were obtained usingthe prior set 2 (which used more peaked distributions supplementary table S3 Supplementary Material online) andconsidering various sets of representative sample sites (supplementary table S4 Supplementary Material online)

By choosing among different competing scenarios eachanalysis identifies the most probable source(s) for a givenintroduced population The results come in the form of dif-fering levels of support for the competing introduction mod-els It is thus worth stressing here that the identification of themost probable source population X (from among a finite setof possible source populations) does not necessarily meanthat the introduced population Y originated from exactlylocation X but rather that the most probable origin of thefounders of the invasive population sampled at site Y is apopulation genetically similar to the source population sam-pled at site X However for sake of concision we will usehereafter the simplified terminology that the most probableorigin of Y is X

After the early invasion of Hawaii (1980) the first recordedintroduction was to the genetically heterogeneous westernUS and thus that is where our main analyses start (analyses1andash1d table 2) We considered the three western US samplesites in separate analyses and we found in each case that thebest scenario included genetic admixture between Asia andHawaii as sources (table 2 analysis 1andash1c mean posteriorprobability of the best scenario Pfrac14 0999) We subsequentlytested if this common admixture pattern resulted from asingle or multiple introduction events from Asia andorHawaii including the possibility of local colonization eventsamong western US sites (analysis 1d) The best invasion sce-nario for the western US included an initial admixture eventwith Asia and Hawaii as probable sources in northernCalifornia (site US-Wat admixture event A1 in fig 1) followedby (i) a local colonization southward into San Diego (US-SD)and northward into Oregon state (US-Sok) and (ii) a second-ary introduction event from Hawaii to Oregon (US-Sok) (A2event in fig 1 Pfrac14 0690) This result is supported by variousgenetic clustering results (fig 1 supplementary figs S1 and S2Supplementary Material online) as well as raw F-statisticsvalues which indicate high differentiation level between US-Sok and other continental US sample sites and a substantiallylower FST between US-Sok and Hawaii (supplementary tableS5 Supplementary Material online) Once the invasion historywas resolved for the western US group we inferred the mostprobable source(s) of the invasive eastern US and European

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

983Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

genetic groups using two independent analyses (analyses 2aand 2b table 1 and supplementary table S2 SupplementaryMaterial online) We found evidence for a single origin for theeastern US group corresponding to an intra-continentalspread from the San Diego area (Pfrac14 0779 table 2)

The most probable origin for Europe was an independentintroduction from the Asian native range (Pfrac14 0716 table 2)The European and North American invasions occurred nearlysimultaneously and thus we also explored gene flow betweenthe two regions Europe was never selected as a source for theeastern US (analysis 3a Pfrac14 0744) no matter which sets ofsample sites were considered In contrast we found someevidence of asymmetrical gene flow from the eastern USinto at least some European locations When consideringthe northern European sample site from Germany (GE-Dos) as a target we found that the best scenario includedan admixed origin with Asia and the eastern US (supplementary table S4 Supplementary Material online analysis 3bPfrac14 0488 and Pfrac14 0501 for prior set 1 and 2 respectively)In contrast the best scenario did not include such admixtureevent when considering the southern European sample sitefrom Italy (IT-Tre) as a target (table 2 analysis 3b Pfrac14 0510)We therefore replicated analysis 3b on all possible combina-tions of European and eastern US sample sites to assess therobustness of this result and evaluate the possibility of a geo-graphic pattern for admixture events For each of theEuropean sample sites figure 2 summarizes which replicateanalyses selected a scenario including an admixture betweeneastern US and Asia or a scenario of a single introductionfrom Asia (and see supplementary fig S3 SupplementaryMaterial online for results using an alternative representativesample site for the native range) Results tended to indicate

an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north-south gradient More specifically eastern US alleles were pre-sent in northern Europe and absent or at least not detectedby our model choice method in southern Europe We thenfurther tested whether the observed admixture in the northcorresponded to the mixing of eastern US genes with south-ern European genes originating from the initial introductionevent from Asia versus the mixing of eastern US genes withAsian genes introduced through a separate secondary intro-duction event from Asia in northern Europe (analysis 4) Thebest scenario was clearly that of an admixture between east-ern US genes and southern European genes which them-selves had a probable origin as part of the initialintroduction event from Asia (Pfrac14 0998 A3 event in fig 1)

Finally we found that the sampled population from Braziloriginated from North America with genetic admixture be-tween D suzukii individuals from the southwest and easternregions of the US (analysis 5a Pfrac14 0631 A4 event in fig 1)Regarding the most recent invasive population from LaReunion we found that this island population originatedfrom Europe with admixture between individuals fromnorthern and southern Europe (analysis 5b Pfrac14 0500 A5event in fig 1)

Comparison of Model Choice Analyses Using ABC-RFversus ABC-LDAFor all 11 analyses ABC-LDA performed well when using largereference tables (ie 500000 simulations per scenario table 2and supplementary table S3 Supplementary Material online)Indeed the prior error rates for ABC-LDA with large referencetables were smaller than the prior error rates for ABC-RF with

Table 1 Formulation of the Model Choice Analyses that were Carried Out Successively to Reconstruct Invasion Routes of D suzukii Using ABC

ModelChoiceAnalysis

Number ofComparedScenarios

Tackled Question Potential source genetic group Focal populations

1a 3 What are the origins of western US populations Asia Hawaii US-Wat1b US-Sok1c US-SD1d 7 What are the relations among western US popu-

lations and their extra-continental sourcesAsia Hawaii US-Wat US-Sok

US-SDUS-Wat thorn US-Sokthorn US-SD

2a 6 What are the origins of eastern US populations Asia Hawaii western US eastern US2b 6 What are the origins of European populations Asia Hawaii western US Europe3a 10 Is there asymmetrical gene flow from Europe to

eastern USAsia Hawaii western US Europe eastern US

3b 10 Is there asymmetrical gene flow from eastern USto Europe

Asia Hawaii western US eastern US Europe

4 2 Does the admixture with eastern US genes innorthern Europe result from a secondary Asianintroduction

Asia Hawaii western US eastern USsouthern Europe

northern Europe

5a 21 What are the origins of the Brazilian population Asia Hawaii western US eastern USsouthern Europe northern Europe

Brazil

5b 21 What are the origins of La Reunion population Asia Hawaii western US eastern USsouthern Europe northern Europe

La Reunion

NOTEmdash Each numbered analysis is a comparison of a certain number of scenarios by ABC model choice We summarize each analysis by stating the question that it addressed Adetailed verbal description of each compared scenario is given in supplementary table S2 Supplementary Material online The 11 analyses are nested in the sense that eachsubsequent analysis use the result obtained from the previous one For example in Analysis 2a ldquoWhat are the origins of eastern US populationsrdquo capitalizes on the historyinferred for western US populations from analysis 1d ldquoPotential source genetic grouprdquo indicates all the potential source populations considered in the analyses for which onewants to identify the origin of the focal population (ie the target)

Fraimout et al doi101093molbevmsx050 MBE

984Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Tab

le2

Res

ult

so

fM

od

elC

ho

ice

An

alys

esU

sin

gA

BC

-RF

and

AB

C-L

DA

Pri

or

erro

rra

teP

ost

erio

rP

rob

abili

tyo

fth

eb

est

Mo

del

Ori

gin

of

the

foca

lp

op

ula

tio

nu

sin

gei

ther

AB

C-R

Fo

rA

BC

-LD

A(i

e

bes

tm

od

el)

An

alys

isT

ota

lNu

mb

ero

fSc

enar

ios

Nu

mb

ero

fSu

mS

tats

A

BC

-RF

(sd

)A

BC

-LD

A(l

arge

refe

ren

ceta

ble

)

AB

C-L

DA

(sm

all

refe

ren

ceta

ble

)

AB

C-R

F(s

d)

AB

C-L

DA

(lar

gere

fere

nce

tab

le)

[CI]

1a3

390

100

(60

002)

007

50

085

100

0(6

000

1)1

000

[10

001

000

]A

siathorn

Haw

aii

1b3

009

4(6

000

1)0

070

009

10

989

(60

005)

099

9[0

999

09

99]

1c3

009

6(6

000

1)0

079

008

70

998

(60

002)

099

9[0

999

09

99]

1d7

130

032

7(6

000

1)0

274

036

80

690

(60

018)

074

1[0

714

07

67]

Asi

athorn

Haw

aii

2a6

130

023

2(6

000

1)0

203

024

10

779

(60

019)

078

8[0

764

08

13]

Wes

tern

US

2b6

130

023

2(6

000

1)0

179

039

30

716

(60

013)

060

4[0

592

06

16]

Asi

a

3a10

204

032

8(6

000

1)0

265

036

40

744

(60

020)

084

4[0

820

08

68]

Wes

tern

US

3b10

204

040

8(6

000

1)0

358

042

30

510

(60

014)

040

9[0

391

04

26]

Asi

a

42

301

012

1(6

000

1)0

111

021

21

000

(60

009)

099

9[0

999

09

99]

Sou

ther

nEu

rop

ethorn

east

ern

US

5a21

424

029

4(6

000

1)0

230

042

30

631

(60

034)

081

2[0

788

08

37]

Wes

tern

USthorn

east

ern

US

5b21

424

029

8(6

000

1)0

242

042

80

500

(60

027)

029

0[0

256

03

23]

No

rth

ern

Euro

pethorn

sou

ther

nEu

rop

e

NO

TEmdash

All

mo

del

cho

ice

anal

yses

wer

eca

rrie

do

ut

usi

ng

the

pri

or

set

1(s

up

ple

men

tary

tab

leS8

Su

pp

lem

enta

ryM

ater

ialo

nlin

e)an

da

sin

gle

set

ofs

amp

lesi

tes

rep

rese

nta

tive

oft

he

pre

-defi

ned

gen

etic

gro

up

sSa

mp

lesi

teJP

-To

kfo

rth

en

ativ

eA

sian

gro

up

US-

Haw

fort

he

Haw

aiia

ngr

ou

pU

S-W

atU

S-So

kan

dU

S-SD

fort

he

wes

tern

US

gro

up

US-

NC

fort

he

east

ern

US

gro

up

IT

-Tre

fort

he

sou

ther

nEu

rop

egr

ou

pG

E-D

os

fort

he

no

rth

ern

Euro

pe

gro

up

BR

-PA

fort

he

Bra

zilg

rou

pa

nd

FR-

Reu

for

the

LaR

eun

ion

gro

up

(fig

1)D

atas

ets

wer

esu

mm

ariz

edu

sin

gth

ew

ho

lese

to

fsu

mm

ary

stat

isti

csp

rop

ose

db

yD

IYA

BC

(Co

rnu

etet

al2

014)

Th

eto

taln

um

ber

ofs

um

mar

yst

atis

tics

(Nu

mb

ero

fSu

mS

tats

)as

wel

las

the

tota

lnu

mb

ero

fco

mp

ared

scen

ario

s(T

ota

lnu

mb

ero

fsce

nar

ios)

are

ind

icat

edfo

reac

han

alys

isP

rio

rerr

orr

ates

and

po

ster

iorp

rob

abili

ties

oft

he

bes

tmo

del

cho

sen

usi

ng

AB

C-R

Fw

ere

aver

aged

ove

r10

rep

licat

ean

alys

esA

BC

-RF

and

AB

C-L

DA

trea

tmen

tsyi

eld

edth

esa

me

bes

tm

od

elch

oic

efo

ral

lan

alys

esan

dis

den

ote

din

the

colu

mn

ldquoOri

gin

oft

he

foca

lpo

pu

lati

on

usi

ng

eith

erA

BC

-RF

or

AB

C-L

DA

(ie

b

est

mo

del

)rdquoA

dm

ixtu

reb

etw

een

two

sou

rce

po

pu

lati

on

sar

ere

pre

sen

ted

by

aldquothorn

rdquosi

gnA

BC

-LD

Ap

ost

erio

rpro

bab

iliti

eso

fth

eb

est

mo

del

sw

ere

esti

mat

edu

sin

gldquol

arge

refe

ren

ceta

ble

srdquow

ith

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioA

BC

-LD

Ap

rio

rerr

orr

ates

wer

eco

mp

ute

du

sin

gre

fere

nce

tab

les

oft

wo

diff

eren

tsi

zes

ldquolar

gere

fere

nce

tab

lerdquo(

ie

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

io)

and

ldquosm

allr

efer

ence

tab

lerdquo

(ie

10

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioas

for

AB

C-R

Fan

alys

es)

SD

sta

nd

sfo

rst

and

ard

dev

iati

on

ove

r10

rep

licat

ean

alys

esan

dC

Ifo

r95

co

nfi

den

cein

terv

alco

mp

ute

dfo

llow

ing

Co

rnu

etet

al(

2008

)

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

985Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

small reference tables (paired t-test t10frac1427 Pfrac14 002)However ABC-RF performed better than ABC-LDA whenusing small reference tables (only 10000 simulations per sce-nario) having lower prior error rates (paired t-test t10frac14116Plt 00001 table 2 and supplementary table S3Supplementary Material online) The difference in prior errorrate was particularly marked for the complex analyses iethose in which many introduction scenarios were comparedand many summary statistics were computed For instancefor the analyses 5a and 5b (21 compared scenarios and 424summary statistics computed) prior error rates were 0217and 0221 for ABC-RF versus 0417 and 0425 for ABC-LDA

Regarding computational effort we found that for a givenobserved dataset an ABC-RF treatment to select the bestscenario and compute its posterior probability required a

ca 50 times shorter computational duration than when pro-cessing a ABC-LDA treatment with a traditional referencetable of large size (ie 500000 simulated datasets per sce-nario) Moreover estimation of prior error rates with ABC-RF took only a few additional minutes of computation timefor 10000 pseudo-observed datasets whereas with ABC-LDAit lasted several hours to several days (depending on theanalysis processed) for 500 pseudo-observed datasets

The same invasion scenario had the highest probability forall 11 analyses carried out using ABC-RF and ABC-LDA withlarge number of simulated datasets (table 2 and supplementary table S3 Supplementary Material online) Posterior prob-abilities of the best scenario estimated using ABC-RF were notsystematically higher (or lower) than those using ABC-LDAwith large number of simulated datasets In particular there

FIG 1 Worldwide invasion scenario of D suzukii inferred from microsatellite data and date of first observation Map and schematic showingsample sites and the invasion routes taken by D suzukii as reconstructed by ABC-RF (Pudlo et al 2016) on a total of 685 individuals from 23geographic locations genotyped at 25 microsatellite loci (see results and methods for details) The native range is in dark grey and the invasiverange is in light-gray (cf delimitation from fig 1 in Asplen et al 2015) The year in which D suzukii was first observed at each sample site is indicatedin italics The 23 geographical locations that were sampled are represented by circles (native range) and squares diamonds and triangles(introduced range) Squares indicate populations that experienced weak bottlenecks (ie median value of bottleneck severity lt 012 seemain text and table 3) diamonds indicate moderate bottlenecks (012 lt bottleneck severity lt 022) and triangles indicate strong bottlenecks(ie bottleneck severitygt 03) The colors of the symbol for the sample sites and arrows between them correspond to the different genetic groupsobtained using the clustering method BAPS (supplementary appendix S1 Supplementary Material online) The arrows indicate the most probableinvasion pathways A1ndashA5 indicate five separate admixture events between different sources O1ndashO3 indicate the most probable sources withinthe native range for the primary introduction events A1frac14 Hawaiithorn southeast China A2frac14Watsonville (western US)thorn Hawaii A3frac14 southernEurope thorn eastern US A4 frac14 western US thorn eastern US A5 frac14 southern Europe thorn northern Europe O1 frac14 Japan O2 frac14 southeast China O3 frac14northeast China

Fraimout et al doi101093molbevmsx050 MBE

986Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

was no clear trend of overestimation of posterior probabilitywhen using ABC-LDA with large number of simulated data-sets versus ABC-RF a potential bias suggested by Pudlo et al(2016) On the other hand we found evidence of instability inthe estimation of posterior probabilities and hence modelselection when using ABC-LDA with a small numbers of sim-ulated datasets in the reference table (ie 10000 simulationsper scenario as for ABC-RF) In this case the scenario with thehighest probability was different from that selected using ei-ther ABC-LDA with the large reference table or ABC-RF infour and two of the 11 analyses when using the prior sets 1and 2 respectively (results not shown)

Refining the Origins of the Primary IntroductionEvents from the Native AreaAdditional ABC-RF analyses were run to determine whichgeographical area in the native range was the most likelyorigin of each of the three primary introductions (ie inHawaii western US and Europe) The most probable originof the Hawaiian invasive population was Japan with a mean

posterior probability of 0959 (origin O1 in fig 1) ForWatsonville (western US sample site US-Wat) the mostprobable source of the primary introduction from Asia wassoutheast China (origin O2 in fig 1 Pfrac14 0860) Finally forEurope the most probable source of the primary introduc-tion from Asia was northeast China (origin O3 in fig 1Pfrac14 0855) Similar posterior probabilities were obtained usingthe prior set 1 (values presented above) and the prior set 2(results not shown) For inferences regarding Europe similarposterior probabilities were also obtained using various sets ofsample sites from Europe and the eastern US (results notshown)

Estimation of Parameter Distributions for AdmixtureRate and Bottleneck SeverityWe used the most probable worldwide invasion scenario (fig1) to estimate the posterior distributions of parameters re-lated to bottleneck severity associated to the foundation ofnew populations and genetic admixture rates between differ-entiated sources (ie A1ndashA5 events in fig 1)

FIG 2 Admixed or non-admixed origin of D suzukii in Europe inferred for each sample sites Potential source populations include (among others)the Asian native range (represented by the Japanese sample site JP-Tok) and the invasive genetic group from eastern US represented by one of thefour invasive sample sites collected in this area (fig 1) Four replicate independent ABC-RF treatments corresponding to the analysis 3b (table 2)were hence carried out for each targeted European sample site using one of the four eastern US sample site The treatments labeled 1 2 3 and 4 inthe pies of the figure have been carried out with the sample sites US-NC US-Wis US-Gen and US-Col respectively A pie quarter in blue indicatesthat the best scenario corresponds to a single introduction event from Asia A pie quarter in red indicates that the best scenario corresponds to anadmixture event between Asia and eastern US Dates in italic correspond to the dates of first record of the European sample sites

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

987Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

The posterior distributions parameters relating to bottle-necks and admixture substantially differed from the priorsindicating that genetic data were informative for such param-eters (tables 3 and 4 and see supplementary tables S6 and S7Supplementary Material online for results using an alternativeset of representative sample sites) Regarding bottleneck se-verity we found that bottlenecks tended to be more severefor populations founded by individuals originating from asource located on a different continent (extra-continentalorigin) than for populations founded by individuals originat-ing from a source located on the same continent (intra-con-tinental origin table 3 and fig 1) The bottleneck severity forpopulations founded by individuals corresponding to a com-bination of the two types of sources (extra- and intra-continental origins) was either weak or moderate The stron-gest bottleneck severity was found for the first invasive

population in Hawaii which showed the lowest level of ge-netic variation among all invasive populations Bottleneckseverity values were globally slightly stronger in Europeanthan in US populations suggesting a smaller number offounding individuals andor a slower demographic recoveryin European populations

Regarding admixture (table 4) we found that (i) for thefirst recorded invasive population in western US (ieWatsonville sample site US-Wat) the genetic contributionfrom China was substantially larger than that from Hawaii(median value of 0759) (ii) for the western US populationUS-Sok the secondary genetic contribution from Hawaii wassmall compared to that from Watsonville (median value of0237) and (iii) the sample sites from northern Europe (egGE-Dos in Germany) contained a rather low proportion ofgenes originating from the eastern US (median value of

Table 3 Bottleneck Severity in Invasive Populations of D suzukii

Prior 1 Prior 2Introduction Type Sample Site Mean Median Mode q5 q95 Mean Median Mode q5 q95

Extra continental US-Haw 0517 0500 0495 0326 0755 0490 0484 0477 0382 0615US-Wat 0181 0138 0114 0041 0448 0143 0127 0118 0059 0275IT-Tre 0268 0179 0171 0059 0837 0236 0189 0172 0101 0553BR-PA 0158 0126 0104 0020 0407 0208 0193 0172 0067 0399FR-Reu 0259 0215 0186 0051 0626 0245 0214 0190 0069 0534

Intra-continental US-SD 0135 0117 0106 0039 0283 0117 0104 0091 0053 0224US-NC 0089 0067 0061 0015 0230 0111 0098 0090 0046 0218

Extra thorn intracontinental

US-Sok 0116 0099 0088 0027 0250 0119 0106 0099 0052 0229GE-Dos 0189 0177 0155 0061 0356 0205 0189 0171 0088 0372

Prior values 0628 0199 NA 0021 1904 0517 0500 NA 0326 0754

NOTEmdash Extra-continental introductions correspond to a long distance introduction from a source located apart from the continent of the focal population intra-continentalintroduction corresponds to an introduction event from a source located on the same continent than the focal population and Extrathorn Intra continental introductioncorresponds to a combination of the two types of sources Mean median and mode estimates as well as bounds of 90 credibility intervals (q5 and q95) are indicated foreach bottleneck severity parameter We roughly classified the estimated bottleneck severity values into three classes (represented here by the three shades of gray) weak (iemedian value of bottleneck severitylt 012 in light gray) moderate (012lt bottleneck severitylt 022 in gray) and strong (ie bottleneck severitygt 03 in dark gray) The set ofsample sites used for the ABC estimations presented here include (i) for the native area Japan (JP-Tokthorn JP-Sap) South-East China (CN-Nin) and North-East China (CN-LanthornCN-Lia) and (ii) for the invaded range US-Wat US-Sok and US-SD for western US US-NC for eastern US IT-Tre for southern Europe GE-Dos for northern Europe BR-PAfor South-America (Brazil) and FR-Reu for La Reunion island Code names of the sample sites are the same as in fig 1 and supplementary table S1 Supplementary Materialonline in which bottleneck severity classes are also given for each sample site See supplementary table S6 Supplementary Material online for results on a different set ofrepresentative sample sites

Table 4 Posterior Distributions of Admixture Rates for the Five Admixture Events Inferred for the Final Worldwide Invasion Scenario Described infigure 1

Admixture Event Admixture Rate (gene fraction from pop x) Mean Median Mode q5 q95

Prior 1 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0759 0761 0774 0659 0854A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0237 0230 0227 0092 0400A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0286 0278 0266 0112 0481A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0467 0461 0442 0187 0754A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0388 0380 0426 0132 0670

Prior 2 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0761 0762 0766 0684 0833A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0224 0219 0199 0114 0350A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0312 0306 0284 0152 0487A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0422 0418 0429 0247 0613A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0370 0368 0356 0178 0569

NOTEmdash Admixture events are denoted as in figure 1 Each admixture rate parameter r points to the name of the admixed site and corresponds to the fraction of genesoriginating from the source site in parentheses (1 r genes originate from the other source site) Mean median and mode estimates as well as bounds of 90 credibilityintervals (q5 and q95) are indicated for each admixture parameter Estimations assuming the prior set 1 and the prior set 2 (supplementary table S8 Supplementary Materialonline) are provided The set of representative sample sites used for the ABC estimations presented here is the same than in table 3 Code names of the population sites are thesame as in fig 1 and supplementary table S1 Supplementary Material online See supplementary table S7 Supplementary Material online for results on a different set ofrepresentative sample sites

Fraimout et al doi101093molbevmsx050 MBE

988Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

0286) We found more balanced admixture rates for thepopulations from Brazil (BR-PA) and La Reunion (FR-Reu)but information on this parameter was less accurate in thesecases as indicated by large 90 credibility intervals

Model-Posterior CheckingLike any model-based methods ABC inferences do not revealthe ldquotruerdquo evolutionary history but allows choosing the bestamong a necessarily limited set of scenarios that have beencompared and to estimate posterior distributions of param-eters under this scenario How well the inferred scenario-posterior combination matches with the observed datasetremains to be evaluated using an ABC model-posterior check-ing analysis When applying such an analysis on the final in-vasion scenario detailed in figure 1 we found that whenconsidering the prior set 1 only 54 of the 1141 summarystatistics used as test quantities had low posterior predictiveP-values (ie 0002lt ppp-valueslt 5) Moreover none ofthose ppp-values values remained significant when correctingfor multiple comparisons (Benjamini and Hochberg 1995)Similar results were obtained when assuming the prior set 2and when considering different sets of representative samplesites in our analyses (results not shown) These findings showthat the final worldwide invasion scenario (fig 1 with associ-ated parameter posterior distributions) matches the observeddataset well In agreement with this the projections of thesimulated datasets on the principal component axes from thefinal model-posterior combination were well grouped andcentered on the target point corresponding to the observeddataset (supplementary fig S4 Supplementary Material on-line) Due to the modest size of the dataset (ie 25 microsat-ellite loci) however only situations of major inadequacy ofthe model-posterior combination to the observed dataset arelikely to be identified

DiscussionIn the present paper we decipher the routes taken byD suzukii in its invasion worldwide evaluating evidence ofbottlenecks and genetic admixture associated with differentintroductions and we quantify the efficiency of ABC-RF rel-ative to the more standard ABC-LDA method for choosingamong introduction scenarios

A Complex Worldwide Invasion HistoryPrior to this study and that of Adrion et al (2014) our un-derstanding of the worldwide introduction pathways ofD suzukii was based on historical and observational datawhich were incomplete and potentially misleading Our re-sults indicate three distinct introductions from the nativerangemdashto Hawaii the western side of North America andwestern Europe which accords well to findings of Adrionet al (2014) from a different set of markers (X-linked se-quence data) Both the western North American introduc-tions and at least some European populations show signs ofadmixture Hawaii and China both appear to have contrib-uted to the western North American introductions whichwas in turn the most probable source for the introductioninto eastern North America Similarly China and to a lesser

extent eastern North America both appear to have contrib-uted to the European introduction but mostly in northernEuropean areas with respect to the eastern North Americancontribution The almost simultaneous invasion of NorthAmerica and Europe from Asia could be due to increasedtrade between these areas facilitating transport of this speciesbetween regions Alternatively or in addition adaptation tohuman-altered habitats (in this case changes in agriculturalpractices) within the native range of the species could havepromoted invasion to similarly altered habitats worldwide(ie anthropogenicaly induced adaptation to invadeHufbauer et al 2012) However the two invaded continentswere most likely colonized by flies from distinct Chinese geo-graphic areas (fig 1) requiring that adaptation to agricultureoccurred concomitantly at several locations within the nativerange This is possible but not evolutionarily parsimoniousand further data would be required to test this explanationSubsequent introductions from the three primary founderlocations to other locations result in a complex history ofinvasion Both primary and secondary introductions showsigns of demographic bottlenecks and several invasive pop-ulations have multiples sources and thus experienced geneticadmixture between differentiated native or invasivepopulations

The genetic relationships between worldwide D suzukiipopulations define the genetic variation they harbor andmay continue to shape further spread of alleles among pop-ulations Gene flow among populations through continuousdispersal or more punctual admixture events permits thedissemination of allelic variants and new mutations withdramatic consequences for evolutionary trajectories(Lenormand 2002) We found evidence for asymmetric ge-netic admixture from North American towards (northern)European populations If this signal of admixture reflects re-current and on-going dispersal events rather than a singlepast secondary introduction event then variants that arisein North America would be more likely to spread to Europethan the reverse Discriminating between ongoing and pastdispersal events is tricky however especially for recent inva-sions (Benazzo et al 2015) If the occurrence of recurrent geneflow is confirmed then D suzukii populations from Europemay not lag behind American ones in evolutionary termsFrom an applied perspective if resistance to control practicesemerges in North America special efforts to stop the impor-tation of D suzukii to Europe would help prevent the evolu-tion of resistance in Europe and thus reduce damage to crops(Caprio and Tabashnik 1992)

Advances in Methods for Discriminating amongComplex ModelsTo the best of our knowledge the present study is the first touse the recently developed ABC-RF method (Pudlo et al2016) to test competing models characterized by differentlevels of complexity To compare ABC-RF to a more standardABC method we carried out a subset of model choice anal-yses using both ABC-RF and ABC-LDA (Estoup et al 2012)We ran ABC-LDA analyses in two ways with standard sizedlarge reference tables (500000 simulated datasets per

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

989Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

scenario) and with small reference tables (10000 simulateddatasets per scenario the same size as used for ABC-RF) Theperformance of ABC-LDA (in terms of prior error rate) inchoosing among introduction scenarios was slightly betterthan for ABC-RF when using large reference tablesFurthermore for ABC-LDA with large reference tables confi-dence in model choice (in terms of posterior probability) wasrelatively comparable to ABC-RF with small reference tablesThus when analyses are relatively simple or computationalresources are not limiting ABC-LDA can provide as robustinferences of invasion scenarios as ABC-RF

In contrast we found that ABC-RF out-performed ABC-LDA when using similarly small reference tables for both typesof analyses (10000 simulated datasets per scenario) ABC-LDA also was unstable choosing different scenarios in differ-ent replicate analyses when using small reference tablesThese empirical findings correspond well with those ofPudlo et al (2016) using simple models for which the trueposterior probabilities could be calculated Pudlo et al (2016)demonstrated that ABC-RF provided more reliable posteriorprobability of the best (true) scenario than standard ABCmethods when using the same (small) number of simulateddatasets in the reference table (see supplementary fig S3 inPudlo et al 2016) In our work the difference in performancebetween the two methods was most pronounced for com-plex analyses (eg when comparing more than 10 scenariosusing more than 100 summary statistics) Thus when analysesare not simple or computational resources are finite ABC-RFprovides more robust inferences of invasion scenarios thanABC-LDA

The lower computation cost of ABC-RF was also evidentFor a given observed dataset selecting the best supportedscenario and computing its posterior probability was ca 50times faster than when processing an ABC-LDA treatmentwith a traditional reference table of large size Moreover es-timation of prior error rates with ABC-RF took only a fewadditional minutes of computation time whereas it lastedseveral hours to several days with ABC-LDA The conse-quence of low computational costs is not simply in computertime Rather the efficiency of ABC-RF analyses allowed us torun replicate analyses on various sample sets even for themost complex D suzukii invasion scenarios This replication iscritical in evaluating the robustness of our statisticalinferences

We found that at least for some ABC-RF analyses theposterior probability of the best scenario and the global sta-tistical power to choose among alternative invasion scenarios(as measured by the prior error rate) were relatively low Forinstance in three of the 11 analyses (analyses 3b 5a and 5b)the best scenarios had relatively low probabilities (ie rangingfrom 0500 to 0630) and relatively high prior error rates (ieranging from 0300 to 0400) Although replicate analyses car-ried out on different sample sets and prior sets pointed to thesame best scenario for these three analyses such low proba-bility values and high prior error rates suggest a moderatelevel of confidence in the choice of model in these casesSimilarly we found that the scenario choice leading to theconclusion of the presence versus absence of admixed genes

from the eastern US in various European sample sites reliedon probability values that were sometimes as low as 0500Moreover in a minority of cases the best scenario was differ-ent depending on the sample site considered as representa-tive of the eastern US or the native area The observed patternof an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north tosouth gradient should hence be taken cautiously The anal-yses of additional European sample sites are needed to clarifythis issue The above inferential uncertainties also most likelyfind their sources in the relatively small number of markersgenotyped relative to the number complexity and similarityof some of the competing models Such results indicate thatthere is room for improving the robustness of our inferencesat least for a subset of our analyses

Bottlenecks and Genetic AdmixtureWe found strong evidence that bottlenecks and admixtureboth occurred during the course of the invasion of D suzukiiA first sign of bottlenecks is a reduction in neutral geneticdiversity We found that all invasive D suzukii populationswere characterized by significantly lower genetic variationthan native ones with a loss of 64 (US-Col) to 233(US-Haw) of heterozygosity and of 273 (US-Wat) to542 (US-Haw) of allelic diversity respectively (supplementary fig S5 Supplementary Material online) Adrion et al(2014) find a similar significant loss of diversity in their datasetfocused on gene sequences Dlugosch and Parker (2008) andUller and Leimu (2011) reviewed studies of neutral geneticdiversity in a large number of species of animals plants andfungi and compared nuclear molecular diversity within intro-duced and source populations Overall they found that a lossof variation was the most frequent feature in invasive popu-lations However reductions in genetic variation were on av-erage modest (eg average loss of 187 and 155 ofheterozygosity and allelic diversity respectively Dlugoschand Parker 2008) The reductions observed in our study ininvasive D suzukii populations were thus in the same rangefor heterozygosity but higher for allelic diversity (cf averageloss of 111 and 345 of heterozygosity and allelic diversityrespectively) A larger loss of allelic diversity than heterozy-gosity after a bottleneck is nevertheless expected by theory(Nei et al 1975)

The type of introduction pathway explains at least partlythe severity of the reduction in genetic diversity Bottleneckstended to be less severe for populations founded by individ-uals from the same continent than for populations foundedby individuals originating from a different continent A simpleexplanation for such pattern is that due to greater proximitypopulations originating from the same continent are likely tobe initially founded by a larger numbers of individuals andalso to experience recurrent gene flow Populations foundedby individuals from both the same and different continentsexperienced weak or moderate reductions in diversity sug-gesting that multiple introduction pathways restore diversity

In agreement with previous studies on invasions on simi-larly large geographical scales (eg Lombaert et al 2014 andreference therein) we found that admixture events were

Fraimout et al doi101093molbevmsx050 MBE

990Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

frequent in the worldwide invasion history of D suzukii Weidentified at least five admixture events for which sourceswere native andor invasive populations (ie A1ndashA5 eventsin fig 1) The genetic contributions of the sources were un-balanced except for the populations from Brazil and LaReunion but genetic information regarding admixture wasconsiderably less accurate in the latter cases which involvedweakly differentiated source populations

We have provided here the information necessary to eval-uate whether bottlenecks and admixture have influencedoutcomes in this invasion Quantitative genetics studies inthe lab focusing on the key source bottlenecked and ad-mixed populations could now examine the fitness and adap-tive potential of those groups (Lavergne and Molofsky 2007Facon et al 2008 Turgeon et al 2011)

Conclusions and PerspectivesUnderstanding invasion pathways provides key informationfor understanding the role of evolutionary processes in bio-logical invasions For example only does our research revealsources of invasive populations for further relevant compar-isons using quantitative genetics studies in the labAdditionally we found that the invasion of Europe by Dsuzukii was distinct from that of North America with limitedand asymmetrical gene flow between these two main invadedareas This situation provides the opportunity to evaluateevolutionary trajectories in replicate into two separate tem-perate climate regions similarly to research by Gilchrist et al(2001 2004) on the pace of clinal evolution in the invasivefruit fly Drosophila subobscura There are also distinct invasionpathways for areas with warmer climates (ie Hawaii LaReunion and Brazil) that will make interesting comparisons

Retracing invasion routes and making inferences aboutdemographic processes is only possible if there is adequatepolymorphism within populations and significant genetic dif-ferentiation among them As is evident here microsatellitedata can provide enough variation to reveal important path-ways We concur with Adrion et al (2014) however thatgenome-wide data (ie next generation sequencingapproaches) will be tremendously powerful in further dis-criminating among complex invasion scenarios (see alsoCristescu 2015 Estoup et al 2016) Because of the reducedcomputational resources demanded by ABC-RF this methodwill be particularly useful for analysis of massive single nucle-otide polymorphism datasets (Pudlo et al 2016) In additionto the selectively neutral demographic inference leading tothe reconstruction of routes of invasion population (andquantitative) next generation sequencing approaches arequite promising for studying the evolution of phenotypictraits in natural populations (Wray 2013 Gautier 2015)Specifically they can be used to better understand the geneticarchitecture of traits underlying invasion success (Bock et al2015) Drosophila suzukii is a good species for such researchgiven (i) its short generation time and viability in laboratoryconditions which facilitate experimental approaches to studyquantitative traits of interest (Asplen et al 2015) and (ii) theavailability of annotated genome assemblies for this species

(Chiu et al 2013 Ometto et al 2013) along with the hugeamount of genomic resources available in its close relativespecies D melanogaster (Groen and Whiteman 2016)

Materials and Methods

Sampling and GenotypingAdult D suzukii were sampled from the field at a total of 23localities (hereafter termed sample sites) distributed through-out most of the native and invasive range of the species(supplementary table S1 Supplementary Material onlineand fig 1) Samples were collected between 2013 and 2015using baited traps and sweep nets and stored in ethanolNative Asian samples consisted of a total of six sample sitesincluding four Chinese and two Japanese localities Samplesfrom the invasive range were collected in Hawaii (1 samplesite) Continental US (7 sites) Europe (7 sites) Brazil (1 site)and in the French island of La Reunion (1 site) For eachsample site 15ndash44 adult flies were genotyped at 25 of the28 microsatellite loci described in Fraimout et al (2015) re-sulting in a total of 685 genotyped individuals Three micro-satellite loci from Fraimout et al (2015) (ie loci DS31 DS42and DS45) were not included due to the presence of nullalleles at frequenciesgt 10 in some sample sites (resultsnot shown) DNA extraction and PCR amplification as wellas allele scoring were performed following the methodologydetailed in Fraimout et al (2015)

General Framework for Defining Focal PopulationGroups and Sets of Scenarios to Be ComparedUsing ABCOne of the many challenges of ABC analyses is to optimallydefine sets of sample sites to be processed chronologically aspotential sources and targets in a way that do not hamper thecomputational effort As stated in Estoup and Guillemaud(2010) and Lombaert et al (2014) the number and complex-ity of competing scenarios is proportional to the number ofgenetic groups to be accounted in ABC analyses FollowingLombaert et al (2014) we used the results obtained fromthree genetic clustering methods along with historical andgeographical information to define genetic groups that couldinclude several sample sites (supplementary appendix S1Supplementary Material online) We defined seven main ge-netic groups from our set of 23 sample sites Asia Hawaiiwestern US eastern US Europe Brazil and La Reunion Tomake the computations less intensive with respect to com-puter resources ABC treatments were carried out using onerepresentative sample site for each genetic group per replicateanalysis In other words when a genetic group included morethan two sample sites we considered the two most differen-tiated sample sites (ie with the highest FST values in supplementary table S3 Supplementary Material online) as repre-sentative of the group CN-Shi and JP-Tok for the group Asiaall three sample sites for the genetically heterogeneousWestern US group US-NC and US-Wis for the group easternUS GE-Dos and IT-Tre for the group Europe Each of thesample sites retained (ie the representative sample sites) wasincluded alternatively in the sample set considered for ABC

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

991Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 6: Deciphering the Routes of invasion of Drosophila suzukii

genetic groups using two independent analyses (analyses 2aand 2b table 1 and supplementary table S2 SupplementaryMaterial online) We found evidence for a single origin for theeastern US group corresponding to an intra-continentalspread from the San Diego area (Pfrac14 0779 table 2)

The most probable origin for Europe was an independentintroduction from the Asian native range (Pfrac14 0716 table 2)The European and North American invasions occurred nearlysimultaneously and thus we also explored gene flow betweenthe two regions Europe was never selected as a source for theeastern US (analysis 3a Pfrac14 0744) no matter which sets ofsample sites were considered In contrast we found someevidence of asymmetrical gene flow from the eastern USinto at least some European locations When consideringthe northern European sample site from Germany (GE-Dos) as a target we found that the best scenario includedan admixed origin with Asia and the eastern US (supplementary table S4 Supplementary Material online analysis 3bPfrac14 0488 and Pfrac14 0501 for prior set 1 and 2 respectively)In contrast the best scenario did not include such admixtureevent when considering the southern European sample sitefrom Italy (IT-Tre) as a target (table 2 analysis 3b Pfrac14 0510)We therefore replicated analysis 3b on all possible combina-tions of European and eastern US sample sites to assess therobustness of this result and evaluate the possibility of a geo-graphic pattern for admixture events For each of theEuropean sample sites figure 2 summarizes which replicateanalyses selected a scenario including an admixture betweeneastern US and Asia or a scenario of a single introductionfrom Asia (and see supplementary fig S3 SupplementaryMaterial online for results using an alternative representativesample site for the native range) Results tended to indicate

an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north-south gradient More specifically eastern US alleles were pre-sent in northern Europe and absent or at least not detectedby our model choice method in southern Europe We thenfurther tested whether the observed admixture in the northcorresponded to the mixing of eastern US genes with south-ern European genes originating from the initial introductionevent from Asia versus the mixing of eastern US genes withAsian genes introduced through a separate secondary intro-duction event from Asia in northern Europe (analysis 4) Thebest scenario was clearly that of an admixture between east-ern US genes and southern European genes which them-selves had a probable origin as part of the initialintroduction event from Asia (Pfrac14 0998 A3 event in fig 1)

Finally we found that the sampled population from Braziloriginated from North America with genetic admixture be-tween D suzukii individuals from the southwest and easternregions of the US (analysis 5a Pfrac14 0631 A4 event in fig 1)Regarding the most recent invasive population from LaReunion we found that this island population originatedfrom Europe with admixture between individuals fromnorthern and southern Europe (analysis 5b Pfrac14 0500 A5event in fig 1)

Comparison of Model Choice Analyses Using ABC-RFversus ABC-LDAFor all 11 analyses ABC-LDA performed well when using largereference tables (ie 500000 simulations per scenario table 2and supplementary table S3 Supplementary Material online)Indeed the prior error rates for ABC-LDA with large referencetables were smaller than the prior error rates for ABC-RF with

Table 1 Formulation of the Model Choice Analyses that were Carried Out Successively to Reconstruct Invasion Routes of D suzukii Using ABC

ModelChoiceAnalysis

Number ofComparedScenarios

Tackled Question Potential source genetic group Focal populations

1a 3 What are the origins of western US populations Asia Hawaii US-Wat1b US-Sok1c US-SD1d 7 What are the relations among western US popu-

lations and their extra-continental sourcesAsia Hawaii US-Wat US-Sok

US-SDUS-Wat thorn US-Sokthorn US-SD

2a 6 What are the origins of eastern US populations Asia Hawaii western US eastern US2b 6 What are the origins of European populations Asia Hawaii western US Europe3a 10 Is there asymmetrical gene flow from Europe to

eastern USAsia Hawaii western US Europe eastern US

3b 10 Is there asymmetrical gene flow from eastern USto Europe

Asia Hawaii western US eastern US Europe

4 2 Does the admixture with eastern US genes innorthern Europe result from a secondary Asianintroduction

Asia Hawaii western US eastern USsouthern Europe

northern Europe

5a 21 What are the origins of the Brazilian population Asia Hawaii western US eastern USsouthern Europe northern Europe

Brazil

5b 21 What are the origins of La Reunion population Asia Hawaii western US eastern USsouthern Europe northern Europe

La Reunion

NOTEmdash Each numbered analysis is a comparison of a certain number of scenarios by ABC model choice We summarize each analysis by stating the question that it addressed Adetailed verbal description of each compared scenario is given in supplementary table S2 Supplementary Material online The 11 analyses are nested in the sense that eachsubsequent analysis use the result obtained from the previous one For example in Analysis 2a ldquoWhat are the origins of eastern US populationsrdquo capitalizes on the historyinferred for western US populations from analysis 1d ldquoPotential source genetic grouprdquo indicates all the potential source populations considered in the analyses for which onewants to identify the origin of the focal population (ie the target)

Fraimout et al doi101093molbevmsx050 MBE

984Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Tab

le2

Res

ult

so

fM

od

elC

ho

ice

An

alys

esU

sin

gA

BC

-RF

and

AB

C-L

DA

Pri

or

erro

rra

teP

ost

erio

rP

rob

abili

tyo

fth

eb

est

Mo

del

Ori

gin

of

the

foca

lp

op

ula

tio

nu

sin

gei

ther

AB

C-R

Fo

rA

BC

-LD

A(i

e

bes

tm

od

el)

An

alys

isT

ota

lNu

mb

ero

fSc

enar

ios

Nu

mb

ero

fSu

mS

tats

A

BC

-RF

(sd

)A

BC

-LD

A(l

arge

refe

ren

ceta

ble

)

AB

C-L

DA

(sm

all

refe

ren

ceta

ble

)

AB

C-R

F(s

d)

AB

C-L

DA

(lar

gere

fere

nce

tab

le)

[CI]

1a3

390

100

(60

002)

007

50

085

100

0(6

000

1)1

000

[10

001

000

]A

siathorn

Haw

aii

1b3

009

4(6

000

1)0

070

009

10

989

(60

005)

099

9[0

999

09

99]

1c3

009

6(6

000

1)0

079

008

70

998

(60

002)

099

9[0

999

09

99]

1d7

130

032

7(6

000

1)0

274

036

80

690

(60

018)

074

1[0

714

07

67]

Asi

athorn

Haw

aii

2a6

130

023

2(6

000

1)0

203

024

10

779

(60

019)

078

8[0

764

08

13]

Wes

tern

US

2b6

130

023

2(6

000

1)0

179

039

30

716

(60

013)

060

4[0

592

06

16]

Asi

a

3a10

204

032

8(6

000

1)0

265

036

40

744

(60

020)

084

4[0

820

08

68]

Wes

tern

US

3b10

204

040

8(6

000

1)0

358

042

30

510

(60

014)

040

9[0

391

04

26]

Asi

a

42

301

012

1(6

000

1)0

111

021

21

000

(60

009)

099

9[0

999

09

99]

Sou

ther

nEu

rop

ethorn

east

ern

US

5a21

424

029

4(6

000

1)0

230

042

30

631

(60

034)

081

2[0

788

08

37]

Wes

tern

USthorn

east

ern

US

5b21

424

029

8(6

000

1)0

242

042

80

500

(60

027)

029

0[0

256

03

23]

No

rth

ern

Euro

pethorn

sou

ther

nEu

rop

e

NO

TEmdash

All

mo

del

cho

ice

anal

yses

wer

eca

rrie

do

ut

usi

ng

the

pri

or

set

1(s

up

ple

men

tary

tab

leS8

Su

pp

lem

enta

ryM

ater

ialo

nlin

e)an

da

sin

gle

set

ofs

amp

lesi

tes

rep

rese

nta

tive

oft

he

pre

-defi

ned

gen

etic

gro

up

sSa

mp

lesi

teJP

-To

kfo

rth

en

ativ

eA

sian

gro

up

US-

Haw

fort

he

Haw

aiia

ngr

ou

pU

S-W

atU

S-So

kan

dU

S-SD

fort

he

wes

tern

US

gro

up

US-

NC

fort

he

east

ern

US

gro

up

IT

-Tre

fort

he

sou

ther

nEu

rop

egr

ou

pG

E-D

os

fort

he

no

rth

ern

Euro

pe

gro

up

BR

-PA

fort

he

Bra

zilg

rou

pa

nd

FR-

Reu

for

the

LaR

eun

ion

gro

up

(fig

1)D

atas

ets

wer

esu

mm

ariz

edu

sin

gth

ew

ho

lese

to

fsu

mm

ary

stat

isti

csp

rop

ose

db

yD

IYA

BC

(Co

rnu

etet

al2

014)

Th

eto

taln

um

ber

ofs

um

mar

yst

atis

tics

(Nu

mb

ero

fSu

mS

tats

)as

wel

las

the

tota

lnu

mb

ero

fco

mp

ared

scen

ario

s(T

ota

lnu

mb

ero

fsce

nar

ios)

are

ind

icat

edfo

reac

han

alys

isP

rio

rerr

orr

ates

and

po

ster

iorp

rob

abili

ties

oft

he

bes

tmo

del

cho

sen

usi

ng

AB

C-R

Fw

ere

aver

aged

ove

r10

rep

licat

ean

alys

esA

BC

-RF

and

AB

C-L

DA

trea

tmen

tsyi

eld

edth

esa

me

bes

tm

od

elch

oic

efo

ral

lan

alys

esan

dis

den

ote

din

the

colu

mn

ldquoOri

gin

oft

he

foca

lpo

pu

lati

on

usi

ng

eith

erA

BC

-RF

or

AB

C-L

DA

(ie

b

est

mo

del

)rdquoA

dm

ixtu

reb

etw

een

two

sou

rce

po

pu

lati

on

sar

ere

pre

sen

ted

by

aldquothorn

rdquosi

gnA

BC

-LD

Ap

ost

erio

rpro

bab

iliti

eso

fth

eb

est

mo

del

sw

ere

esti

mat

edu

sin

gldquol

arge

refe

ren

ceta

ble

srdquow

ith

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioA

BC

-LD

Ap

rio

rerr

orr

ates

wer

eco

mp

ute

du

sin

gre

fere

nce

tab

les

oft

wo

diff

eren

tsi

zes

ldquolar

gere

fere

nce

tab

lerdquo(

ie

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

io)

and

ldquosm

allr

efer

ence

tab

lerdquo

(ie

10

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioas

for

AB

C-R

Fan

alys

es)

SD

sta

nd

sfo

rst

and

ard

dev

iati

on

ove

r10

rep

licat

ean

alys

esan

dC

Ifo

r95

co

nfi

den

cein

terv

alco

mp

ute

dfo

llow

ing

Co

rnu

etet

al(

2008

)

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

985Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

small reference tables (paired t-test t10frac1427 Pfrac14 002)However ABC-RF performed better than ABC-LDA whenusing small reference tables (only 10000 simulations per sce-nario) having lower prior error rates (paired t-test t10frac14116Plt 00001 table 2 and supplementary table S3Supplementary Material online) The difference in prior errorrate was particularly marked for the complex analyses iethose in which many introduction scenarios were comparedand many summary statistics were computed For instancefor the analyses 5a and 5b (21 compared scenarios and 424summary statistics computed) prior error rates were 0217and 0221 for ABC-RF versus 0417 and 0425 for ABC-LDA

Regarding computational effort we found that for a givenobserved dataset an ABC-RF treatment to select the bestscenario and compute its posterior probability required a

ca 50 times shorter computational duration than when pro-cessing a ABC-LDA treatment with a traditional referencetable of large size (ie 500000 simulated datasets per sce-nario) Moreover estimation of prior error rates with ABC-RF took only a few additional minutes of computation timefor 10000 pseudo-observed datasets whereas with ABC-LDAit lasted several hours to several days (depending on theanalysis processed) for 500 pseudo-observed datasets

The same invasion scenario had the highest probability forall 11 analyses carried out using ABC-RF and ABC-LDA withlarge number of simulated datasets (table 2 and supplementary table S3 Supplementary Material online) Posterior prob-abilities of the best scenario estimated using ABC-RF were notsystematically higher (or lower) than those using ABC-LDAwith large number of simulated datasets In particular there

FIG 1 Worldwide invasion scenario of D suzukii inferred from microsatellite data and date of first observation Map and schematic showingsample sites and the invasion routes taken by D suzukii as reconstructed by ABC-RF (Pudlo et al 2016) on a total of 685 individuals from 23geographic locations genotyped at 25 microsatellite loci (see results and methods for details) The native range is in dark grey and the invasiverange is in light-gray (cf delimitation from fig 1 in Asplen et al 2015) The year in which D suzukii was first observed at each sample site is indicatedin italics The 23 geographical locations that were sampled are represented by circles (native range) and squares diamonds and triangles(introduced range) Squares indicate populations that experienced weak bottlenecks (ie median value of bottleneck severity lt 012 seemain text and table 3) diamonds indicate moderate bottlenecks (012 lt bottleneck severity lt 022) and triangles indicate strong bottlenecks(ie bottleneck severitygt 03) The colors of the symbol for the sample sites and arrows between them correspond to the different genetic groupsobtained using the clustering method BAPS (supplementary appendix S1 Supplementary Material online) The arrows indicate the most probableinvasion pathways A1ndashA5 indicate five separate admixture events between different sources O1ndashO3 indicate the most probable sources withinthe native range for the primary introduction events A1frac14 Hawaiithorn southeast China A2frac14Watsonville (western US)thorn Hawaii A3frac14 southernEurope thorn eastern US A4 frac14 western US thorn eastern US A5 frac14 southern Europe thorn northern Europe O1 frac14 Japan O2 frac14 southeast China O3 frac14northeast China

Fraimout et al doi101093molbevmsx050 MBE

986Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

was no clear trend of overestimation of posterior probabilitywhen using ABC-LDA with large number of simulated data-sets versus ABC-RF a potential bias suggested by Pudlo et al(2016) On the other hand we found evidence of instability inthe estimation of posterior probabilities and hence modelselection when using ABC-LDA with a small numbers of sim-ulated datasets in the reference table (ie 10000 simulationsper scenario as for ABC-RF) In this case the scenario with thehighest probability was different from that selected using ei-ther ABC-LDA with the large reference table or ABC-RF infour and two of the 11 analyses when using the prior sets 1and 2 respectively (results not shown)

Refining the Origins of the Primary IntroductionEvents from the Native AreaAdditional ABC-RF analyses were run to determine whichgeographical area in the native range was the most likelyorigin of each of the three primary introductions (ie inHawaii western US and Europe) The most probable originof the Hawaiian invasive population was Japan with a mean

posterior probability of 0959 (origin O1 in fig 1) ForWatsonville (western US sample site US-Wat) the mostprobable source of the primary introduction from Asia wassoutheast China (origin O2 in fig 1 Pfrac14 0860) Finally forEurope the most probable source of the primary introduc-tion from Asia was northeast China (origin O3 in fig 1Pfrac14 0855) Similar posterior probabilities were obtained usingthe prior set 1 (values presented above) and the prior set 2(results not shown) For inferences regarding Europe similarposterior probabilities were also obtained using various sets ofsample sites from Europe and the eastern US (results notshown)

Estimation of Parameter Distributions for AdmixtureRate and Bottleneck SeverityWe used the most probable worldwide invasion scenario (fig1) to estimate the posterior distributions of parameters re-lated to bottleneck severity associated to the foundation ofnew populations and genetic admixture rates between differ-entiated sources (ie A1ndashA5 events in fig 1)

FIG 2 Admixed or non-admixed origin of D suzukii in Europe inferred for each sample sites Potential source populations include (among others)the Asian native range (represented by the Japanese sample site JP-Tok) and the invasive genetic group from eastern US represented by one of thefour invasive sample sites collected in this area (fig 1) Four replicate independent ABC-RF treatments corresponding to the analysis 3b (table 2)were hence carried out for each targeted European sample site using one of the four eastern US sample site The treatments labeled 1 2 3 and 4 inthe pies of the figure have been carried out with the sample sites US-NC US-Wis US-Gen and US-Col respectively A pie quarter in blue indicatesthat the best scenario corresponds to a single introduction event from Asia A pie quarter in red indicates that the best scenario corresponds to anadmixture event between Asia and eastern US Dates in italic correspond to the dates of first record of the European sample sites

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

987Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

The posterior distributions parameters relating to bottle-necks and admixture substantially differed from the priorsindicating that genetic data were informative for such param-eters (tables 3 and 4 and see supplementary tables S6 and S7Supplementary Material online for results using an alternativeset of representative sample sites) Regarding bottleneck se-verity we found that bottlenecks tended to be more severefor populations founded by individuals originating from asource located on a different continent (extra-continentalorigin) than for populations founded by individuals originat-ing from a source located on the same continent (intra-con-tinental origin table 3 and fig 1) The bottleneck severity forpopulations founded by individuals corresponding to a com-bination of the two types of sources (extra- and intra-continental origins) was either weak or moderate The stron-gest bottleneck severity was found for the first invasive

population in Hawaii which showed the lowest level of ge-netic variation among all invasive populations Bottleneckseverity values were globally slightly stronger in Europeanthan in US populations suggesting a smaller number offounding individuals andor a slower demographic recoveryin European populations

Regarding admixture (table 4) we found that (i) for thefirst recorded invasive population in western US (ieWatsonville sample site US-Wat) the genetic contributionfrom China was substantially larger than that from Hawaii(median value of 0759) (ii) for the western US populationUS-Sok the secondary genetic contribution from Hawaii wassmall compared to that from Watsonville (median value of0237) and (iii) the sample sites from northern Europe (egGE-Dos in Germany) contained a rather low proportion ofgenes originating from the eastern US (median value of

Table 3 Bottleneck Severity in Invasive Populations of D suzukii

Prior 1 Prior 2Introduction Type Sample Site Mean Median Mode q5 q95 Mean Median Mode q5 q95

Extra continental US-Haw 0517 0500 0495 0326 0755 0490 0484 0477 0382 0615US-Wat 0181 0138 0114 0041 0448 0143 0127 0118 0059 0275IT-Tre 0268 0179 0171 0059 0837 0236 0189 0172 0101 0553BR-PA 0158 0126 0104 0020 0407 0208 0193 0172 0067 0399FR-Reu 0259 0215 0186 0051 0626 0245 0214 0190 0069 0534

Intra-continental US-SD 0135 0117 0106 0039 0283 0117 0104 0091 0053 0224US-NC 0089 0067 0061 0015 0230 0111 0098 0090 0046 0218

Extra thorn intracontinental

US-Sok 0116 0099 0088 0027 0250 0119 0106 0099 0052 0229GE-Dos 0189 0177 0155 0061 0356 0205 0189 0171 0088 0372

Prior values 0628 0199 NA 0021 1904 0517 0500 NA 0326 0754

NOTEmdash Extra-continental introductions correspond to a long distance introduction from a source located apart from the continent of the focal population intra-continentalintroduction corresponds to an introduction event from a source located on the same continent than the focal population and Extrathorn Intra continental introductioncorresponds to a combination of the two types of sources Mean median and mode estimates as well as bounds of 90 credibility intervals (q5 and q95) are indicated foreach bottleneck severity parameter We roughly classified the estimated bottleneck severity values into three classes (represented here by the three shades of gray) weak (iemedian value of bottleneck severitylt 012 in light gray) moderate (012lt bottleneck severitylt 022 in gray) and strong (ie bottleneck severitygt 03 in dark gray) The set ofsample sites used for the ABC estimations presented here include (i) for the native area Japan (JP-Tokthorn JP-Sap) South-East China (CN-Nin) and North-East China (CN-LanthornCN-Lia) and (ii) for the invaded range US-Wat US-Sok and US-SD for western US US-NC for eastern US IT-Tre for southern Europe GE-Dos for northern Europe BR-PAfor South-America (Brazil) and FR-Reu for La Reunion island Code names of the sample sites are the same as in fig 1 and supplementary table S1 Supplementary Materialonline in which bottleneck severity classes are also given for each sample site See supplementary table S6 Supplementary Material online for results on a different set ofrepresentative sample sites

Table 4 Posterior Distributions of Admixture Rates for the Five Admixture Events Inferred for the Final Worldwide Invasion Scenario Described infigure 1

Admixture Event Admixture Rate (gene fraction from pop x) Mean Median Mode q5 q95

Prior 1 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0759 0761 0774 0659 0854A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0237 0230 0227 0092 0400A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0286 0278 0266 0112 0481A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0467 0461 0442 0187 0754A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0388 0380 0426 0132 0670

Prior 2 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0761 0762 0766 0684 0833A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0224 0219 0199 0114 0350A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0312 0306 0284 0152 0487A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0422 0418 0429 0247 0613A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0370 0368 0356 0178 0569

NOTEmdash Admixture events are denoted as in figure 1 Each admixture rate parameter r points to the name of the admixed site and corresponds to the fraction of genesoriginating from the source site in parentheses (1 r genes originate from the other source site) Mean median and mode estimates as well as bounds of 90 credibilityintervals (q5 and q95) are indicated for each admixture parameter Estimations assuming the prior set 1 and the prior set 2 (supplementary table S8 Supplementary Materialonline) are provided The set of representative sample sites used for the ABC estimations presented here is the same than in table 3 Code names of the population sites are thesame as in fig 1 and supplementary table S1 Supplementary Material online See supplementary table S7 Supplementary Material online for results on a different set ofrepresentative sample sites

Fraimout et al doi101093molbevmsx050 MBE

988Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

0286) We found more balanced admixture rates for thepopulations from Brazil (BR-PA) and La Reunion (FR-Reu)but information on this parameter was less accurate in thesecases as indicated by large 90 credibility intervals

Model-Posterior CheckingLike any model-based methods ABC inferences do not revealthe ldquotruerdquo evolutionary history but allows choosing the bestamong a necessarily limited set of scenarios that have beencompared and to estimate posterior distributions of param-eters under this scenario How well the inferred scenario-posterior combination matches with the observed datasetremains to be evaluated using an ABC model-posterior check-ing analysis When applying such an analysis on the final in-vasion scenario detailed in figure 1 we found that whenconsidering the prior set 1 only 54 of the 1141 summarystatistics used as test quantities had low posterior predictiveP-values (ie 0002lt ppp-valueslt 5) Moreover none ofthose ppp-values values remained significant when correctingfor multiple comparisons (Benjamini and Hochberg 1995)Similar results were obtained when assuming the prior set 2and when considering different sets of representative samplesites in our analyses (results not shown) These findings showthat the final worldwide invasion scenario (fig 1 with associ-ated parameter posterior distributions) matches the observeddataset well In agreement with this the projections of thesimulated datasets on the principal component axes from thefinal model-posterior combination were well grouped andcentered on the target point corresponding to the observeddataset (supplementary fig S4 Supplementary Material on-line) Due to the modest size of the dataset (ie 25 microsat-ellite loci) however only situations of major inadequacy ofthe model-posterior combination to the observed dataset arelikely to be identified

DiscussionIn the present paper we decipher the routes taken byD suzukii in its invasion worldwide evaluating evidence ofbottlenecks and genetic admixture associated with differentintroductions and we quantify the efficiency of ABC-RF rel-ative to the more standard ABC-LDA method for choosingamong introduction scenarios

A Complex Worldwide Invasion HistoryPrior to this study and that of Adrion et al (2014) our un-derstanding of the worldwide introduction pathways ofD suzukii was based on historical and observational datawhich were incomplete and potentially misleading Our re-sults indicate three distinct introductions from the nativerangemdashto Hawaii the western side of North America andwestern Europe which accords well to findings of Adrionet al (2014) from a different set of markers (X-linked se-quence data) Both the western North American introduc-tions and at least some European populations show signs ofadmixture Hawaii and China both appear to have contrib-uted to the western North American introductions whichwas in turn the most probable source for the introductioninto eastern North America Similarly China and to a lesser

extent eastern North America both appear to have contrib-uted to the European introduction but mostly in northernEuropean areas with respect to the eastern North Americancontribution The almost simultaneous invasion of NorthAmerica and Europe from Asia could be due to increasedtrade between these areas facilitating transport of this speciesbetween regions Alternatively or in addition adaptation tohuman-altered habitats (in this case changes in agriculturalpractices) within the native range of the species could havepromoted invasion to similarly altered habitats worldwide(ie anthropogenicaly induced adaptation to invadeHufbauer et al 2012) However the two invaded continentswere most likely colonized by flies from distinct Chinese geo-graphic areas (fig 1) requiring that adaptation to agricultureoccurred concomitantly at several locations within the nativerange This is possible but not evolutionarily parsimoniousand further data would be required to test this explanationSubsequent introductions from the three primary founderlocations to other locations result in a complex history ofinvasion Both primary and secondary introductions showsigns of demographic bottlenecks and several invasive pop-ulations have multiples sources and thus experienced geneticadmixture between differentiated native or invasivepopulations

The genetic relationships between worldwide D suzukiipopulations define the genetic variation they harbor andmay continue to shape further spread of alleles among pop-ulations Gene flow among populations through continuousdispersal or more punctual admixture events permits thedissemination of allelic variants and new mutations withdramatic consequences for evolutionary trajectories(Lenormand 2002) We found evidence for asymmetric ge-netic admixture from North American towards (northern)European populations If this signal of admixture reflects re-current and on-going dispersal events rather than a singlepast secondary introduction event then variants that arisein North America would be more likely to spread to Europethan the reverse Discriminating between ongoing and pastdispersal events is tricky however especially for recent inva-sions (Benazzo et al 2015) If the occurrence of recurrent geneflow is confirmed then D suzukii populations from Europemay not lag behind American ones in evolutionary termsFrom an applied perspective if resistance to control practicesemerges in North America special efforts to stop the impor-tation of D suzukii to Europe would help prevent the evolu-tion of resistance in Europe and thus reduce damage to crops(Caprio and Tabashnik 1992)

Advances in Methods for Discriminating amongComplex ModelsTo the best of our knowledge the present study is the first touse the recently developed ABC-RF method (Pudlo et al2016) to test competing models characterized by differentlevels of complexity To compare ABC-RF to a more standardABC method we carried out a subset of model choice anal-yses using both ABC-RF and ABC-LDA (Estoup et al 2012)We ran ABC-LDA analyses in two ways with standard sizedlarge reference tables (500000 simulated datasets per

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

989Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

scenario) and with small reference tables (10000 simulateddatasets per scenario the same size as used for ABC-RF) Theperformance of ABC-LDA (in terms of prior error rate) inchoosing among introduction scenarios was slightly betterthan for ABC-RF when using large reference tablesFurthermore for ABC-LDA with large reference tables confi-dence in model choice (in terms of posterior probability) wasrelatively comparable to ABC-RF with small reference tablesThus when analyses are relatively simple or computationalresources are not limiting ABC-LDA can provide as robustinferences of invasion scenarios as ABC-RF

In contrast we found that ABC-RF out-performed ABC-LDA when using similarly small reference tables for both typesof analyses (10000 simulated datasets per scenario) ABC-LDA also was unstable choosing different scenarios in differ-ent replicate analyses when using small reference tablesThese empirical findings correspond well with those ofPudlo et al (2016) using simple models for which the trueposterior probabilities could be calculated Pudlo et al (2016)demonstrated that ABC-RF provided more reliable posteriorprobability of the best (true) scenario than standard ABCmethods when using the same (small) number of simulateddatasets in the reference table (see supplementary fig S3 inPudlo et al 2016) In our work the difference in performancebetween the two methods was most pronounced for com-plex analyses (eg when comparing more than 10 scenariosusing more than 100 summary statistics) Thus when analysesare not simple or computational resources are finite ABC-RFprovides more robust inferences of invasion scenarios thanABC-LDA

The lower computation cost of ABC-RF was also evidentFor a given observed dataset selecting the best supportedscenario and computing its posterior probability was ca 50times faster than when processing an ABC-LDA treatmentwith a traditional reference table of large size Moreover es-timation of prior error rates with ABC-RF took only a fewadditional minutes of computation time whereas it lastedseveral hours to several days with ABC-LDA The conse-quence of low computational costs is not simply in computertime Rather the efficiency of ABC-RF analyses allowed us torun replicate analyses on various sample sets even for themost complex D suzukii invasion scenarios This replication iscritical in evaluating the robustness of our statisticalinferences

We found that at least for some ABC-RF analyses theposterior probability of the best scenario and the global sta-tistical power to choose among alternative invasion scenarios(as measured by the prior error rate) were relatively low Forinstance in three of the 11 analyses (analyses 3b 5a and 5b)the best scenarios had relatively low probabilities (ie rangingfrom 0500 to 0630) and relatively high prior error rates (ieranging from 0300 to 0400) Although replicate analyses car-ried out on different sample sets and prior sets pointed to thesame best scenario for these three analyses such low proba-bility values and high prior error rates suggest a moderatelevel of confidence in the choice of model in these casesSimilarly we found that the scenario choice leading to theconclusion of the presence versus absence of admixed genes

from the eastern US in various European sample sites reliedon probability values that were sometimes as low as 0500Moreover in a minority of cases the best scenario was differ-ent depending on the sample site considered as representa-tive of the eastern US or the native area The observed patternof an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north tosouth gradient should hence be taken cautiously The anal-yses of additional European sample sites are needed to clarifythis issue The above inferential uncertainties also most likelyfind their sources in the relatively small number of markersgenotyped relative to the number complexity and similarityof some of the competing models Such results indicate thatthere is room for improving the robustness of our inferencesat least for a subset of our analyses

Bottlenecks and Genetic AdmixtureWe found strong evidence that bottlenecks and admixtureboth occurred during the course of the invasion of D suzukiiA first sign of bottlenecks is a reduction in neutral geneticdiversity We found that all invasive D suzukii populationswere characterized by significantly lower genetic variationthan native ones with a loss of 64 (US-Col) to 233(US-Haw) of heterozygosity and of 273 (US-Wat) to542 (US-Haw) of allelic diversity respectively (supplementary fig S5 Supplementary Material online) Adrion et al(2014) find a similar significant loss of diversity in their datasetfocused on gene sequences Dlugosch and Parker (2008) andUller and Leimu (2011) reviewed studies of neutral geneticdiversity in a large number of species of animals plants andfungi and compared nuclear molecular diversity within intro-duced and source populations Overall they found that a lossof variation was the most frequent feature in invasive popu-lations However reductions in genetic variation were on av-erage modest (eg average loss of 187 and 155 ofheterozygosity and allelic diversity respectively Dlugoschand Parker 2008) The reductions observed in our study ininvasive D suzukii populations were thus in the same rangefor heterozygosity but higher for allelic diversity (cf averageloss of 111 and 345 of heterozygosity and allelic diversityrespectively) A larger loss of allelic diversity than heterozy-gosity after a bottleneck is nevertheless expected by theory(Nei et al 1975)

The type of introduction pathway explains at least partlythe severity of the reduction in genetic diversity Bottleneckstended to be less severe for populations founded by individ-uals from the same continent than for populations foundedby individuals originating from a different continent A simpleexplanation for such pattern is that due to greater proximitypopulations originating from the same continent are likely tobe initially founded by a larger numbers of individuals andalso to experience recurrent gene flow Populations foundedby individuals from both the same and different continentsexperienced weak or moderate reductions in diversity sug-gesting that multiple introduction pathways restore diversity

In agreement with previous studies on invasions on simi-larly large geographical scales (eg Lombaert et al 2014 andreference therein) we found that admixture events were

Fraimout et al doi101093molbevmsx050 MBE

990Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

frequent in the worldwide invasion history of D suzukii Weidentified at least five admixture events for which sourceswere native andor invasive populations (ie A1ndashA5 eventsin fig 1) The genetic contributions of the sources were un-balanced except for the populations from Brazil and LaReunion but genetic information regarding admixture wasconsiderably less accurate in the latter cases which involvedweakly differentiated source populations

We have provided here the information necessary to eval-uate whether bottlenecks and admixture have influencedoutcomes in this invasion Quantitative genetics studies inthe lab focusing on the key source bottlenecked and ad-mixed populations could now examine the fitness and adap-tive potential of those groups (Lavergne and Molofsky 2007Facon et al 2008 Turgeon et al 2011)

Conclusions and PerspectivesUnderstanding invasion pathways provides key informationfor understanding the role of evolutionary processes in bio-logical invasions For example only does our research revealsources of invasive populations for further relevant compar-isons using quantitative genetics studies in the labAdditionally we found that the invasion of Europe by Dsuzukii was distinct from that of North America with limitedand asymmetrical gene flow between these two main invadedareas This situation provides the opportunity to evaluateevolutionary trajectories in replicate into two separate tem-perate climate regions similarly to research by Gilchrist et al(2001 2004) on the pace of clinal evolution in the invasivefruit fly Drosophila subobscura There are also distinct invasionpathways for areas with warmer climates (ie Hawaii LaReunion and Brazil) that will make interesting comparisons

Retracing invasion routes and making inferences aboutdemographic processes is only possible if there is adequatepolymorphism within populations and significant genetic dif-ferentiation among them As is evident here microsatellitedata can provide enough variation to reveal important path-ways We concur with Adrion et al (2014) however thatgenome-wide data (ie next generation sequencingapproaches) will be tremendously powerful in further dis-criminating among complex invasion scenarios (see alsoCristescu 2015 Estoup et al 2016) Because of the reducedcomputational resources demanded by ABC-RF this methodwill be particularly useful for analysis of massive single nucle-otide polymorphism datasets (Pudlo et al 2016) In additionto the selectively neutral demographic inference leading tothe reconstruction of routes of invasion population (andquantitative) next generation sequencing approaches arequite promising for studying the evolution of phenotypictraits in natural populations (Wray 2013 Gautier 2015)Specifically they can be used to better understand the geneticarchitecture of traits underlying invasion success (Bock et al2015) Drosophila suzukii is a good species for such researchgiven (i) its short generation time and viability in laboratoryconditions which facilitate experimental approaches to studyquantitative traits of interest (Asplen et al 2015) and (ii) theavailability of annotated genome assemblies for this species

(Chiu et al 2013 Ometto et al 2013) along with the hugeamount of genomic resources available in its close relativespecies D melanogaster (Groen and Whiteman 2016)

Materials and Methods

Sampling and GenotypingAdult D suzukii were sampled from the field at a total of 23localities (hereafter termed sample sites) distributed through-out most of the native and invasive range of the species(supplementary table S1 Supplementary Material onlineand fig 1) Samples were collected between 2013 and 2015using baited traps and sweep nets and stored in ethanolNative Asian samples consisted of a total of six sample sitesincluding four Chinese and two Japanese localities Samplesfrom the invasive range were collected in Hawaii (1 samplesite) Continental US (7 sites) Europe (7 sites) Brazil (1 site)and in the French island of La Reunion (1 site) For eachsample site 15ndash44 adult flies were genotyped at 25 of the28 microsatellite loci described in Fraimout et al (2015) re-sulting in a total of 685 genotyped individuals Three micro-satellite loci from Fraimout et al (2015) (ie loci DS31 DS42and DS45) were not included due to the presence of nullalleles at frequenciesgt 10 in some sample sites (resultsnot shown) DNA extraction and PCR amplification as wellas allele scoring were performed following the methodologydetailed in Fraimout et al (2015)

General Framework for Defining Focal PopulationGroups and Sets of Scenarios to Be ComparedUsing ABCOne of the many challenges of ABC analyses is to optimallydefine sets of sample sites to be processed chronologically aspotential sources and targets in a way that do not hamper thecomputational effort As stated in Estoup and Guillemaud(2010) and Lombaert et al (2014) the number and complex-ity of competing scenarios is proportional to the number ofgenetic groups to be accounted in ABC analyses FollowingLombaert et al (2014) we used the results obtained fromthree genetic clustering methods along with historical andgeographical information to define genetic groups that couldinclude several sample sites (supplementary appendix S1Supplementary Material online) We defined seven main ge-netic groups from our set of 23 sample sites Asia Hawaiiwestern US eastern US Europe Brazil and La Reunion Tomake the computations less intensive with respect to com-puter resources ABC treatments were carried out using onerepresentative sample site for each genetic group per replicateanalysis In other words when a genetic group included morethan two sample sites we considered the two most differen-tiated sample sites (ie with the highest FST values in supplementary table S3 Supplementary Material online) as repre-sentative of the group CN-Shi and JP-Tok for the group Asiaall three sample sites for the genetically heterogeneousWestern US group US-NC and US-Wis for the group easternUS GE-Dos and IT-Tre for the group Europe Each of thesample sites retained (ie the representative sample sites) wasincluded alternatively in the sample set considered for ABC

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

991Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 7: Deciphering the Routes of invasion of Drosophila suzukii

Tab

le2

Res

ult

so

fM

od

elC

ho

ice

An

alys

esU

sin

gA

BC

-RF

and

AB

C-L

DA

Pri

or

erro

rra

teP

ost

erio

rP

rob

abili

tyo

fth

eb

est

Mo

del

Ori

gin

of

the

foca

lp

op

ula

tio

nu

sin

gei

ther

AB

C-R

Fo

rA

BC

-LD

A(i

e

bes

tm

od

el)

An

alys

isT

ota

lNu

mb

ero

fSc

enar

ios

Nu

mb

ero

fSu

mS

tats

A

BC

-RF

(sd

)A

BC

-LD

A(l

arge

refe

ren

ceta

ble

)

AB

C-L

DA

(sm

all

refe

ren

ceta

ble

)

AB

C-R

F(s

d)

AB

C-L

DA

(lar

gere

fere

nce

tab

le)

[CI]

1a3

390

100

(60

002)

007

50

085

100

0(6

000

1)1

000

[10

001

000

]A

siathorn

Haw

aii

1b3

009

4(6

000

1)0

070

009

10

989

(60

005)

099

9[0

999

09

99]

1c3

009

6(6

000

1)0

079

008

70

998

(60

002)

099

9[0

999

09

99]

1d7

130

032

7(6

000

1)0

274

036

80

690

(60

018)

074

1[0

714

07

67]

Asi

athorn

Haw

aii

2a6

130

023

2(6

000

1)0

203

024

10

779

(60

019)

078

8[0

764

08

13]

Wes

tern

US

2b6

130

023

2(6

000

1)0

179

039

30

716

(60

013)

060

4[0

592

06

16]

Asi

a

3a10

204

032

8(6

000

1)0

265

036

40

744

(60

020)

084

4[0

820

08

68]

Wes

tern

US

3b10

204

040

8(6

000

1)0

358

042

30

510

(60

014)

040

9[0

391

04

26]

Asi

a

42

301

012

1(6

000

1)0

111

021

21

000

(60

009)

099

9[0

999

09

99]

Sou

ther

nEu

rop

ethorn

east

ern

US

5a21

424

029

4(6

000

1)0

230

042

30

631

(60

034)

081

2[0

788

08

37]

Wes

tern

USthorn

east

ern

US

5b21

424

029

8(6

000

1)0

242

042

80

500

(60

027)

029

0[0

256

03

23]

No

rth

ern

Euro

pethorn

sou

ther

nEu

rop

e

NO

TEmdash

All

mo

del

cho

ice

anal

yses

wer

eca

rrie

do

ut

usi

ng

the

pri

or

set

1(s

up

ple

men

tary

tab

leS8

Su

pp

lem

enta

ryM

ater

ialo

nlin

e)an

da

sin

gle

set

ofs

amp

lesi

tes

rep

rese

nta

tive

oft

he

pre

-defi

ned

gen

etic

gro

up

sSa

mp

lesi

teJP

-To

kfo

rth

en

ativ

eA

sian

gro

up

US-

Haw

fort

he

Haw

aiia

ngr

ou

pU

S-W

atU

S-So

kan

dU

S-SD

fort

he

wes

tern

US

gro

up

US-

NC

fort

he

east

ern

US

gro

up

IT

-Tre

fort

he

sou

ther

nEu

rop

egr

ou

pG

E-D

os

fort

he

no

rth

ern

Euro

pe

gro

up

BR

-PA

fort

he

Bra

zilg

rou

pa

nd

FR-

Reu

for

the

LaR

eun

ion

gro

up

(fig

1)D

atas

ets

wer

esu

mm

ariz

edu

sin

gth

ew

ho

lese

to

fsu

mm

ary

stat

isti

csp

rop

ose

db

yD

IYA

BC

(Co

rnu

etet

al2

014)

Th

eto

taln

um

ber

ofs

um

mar

yst

atis

tics

(Nu

mb

ero

fSu

mS

tats

)as

wel

las

the

tota

lnu

mb

ero

fco

mp

ared

scen

ario

s(T

ota

lnu

mb

ero

fsce

nar

ios)

are

ind

icat

edfo

reac

han

alys

isP

rio

rerr

orr

ates

and

po

ster

iorp

rob

abili

ties

oft

he

bes

tmo

del

cho

sen

usi

ng

AB

C-R

Fw

ere

aver

aged

ove

r10

rep

licat

ean

alys

esA

BC

-RF

and

AB

C-L

DA

trea

tmen

tsyi

eld

edth

esa

me

bes

tm

od

elch

oic

efo

ral

lan

alys

esan

dis

den

ote

din

the

colu

mn

ldquoOri

gin

oft

he

foca

lpo

pu

lati

on

usi

ng

eith

erA

BC

-RF

or

AB

C-L

DA

(ie

b

est

mo

del

)rdquoA

dm

ixtu

reb

etw

een

two

sou

rce

po

pu

lati

on

sar

ere

pre

sen

ted

by

aldquothorn

rdquosi

gnA

BC

-LD

Ap

ost

erio

rpro

bab

iliti

eso

fth

eb

est

mo

del

sw

ere

esti

mat

edu

sin

gldquol

arge

refe

ren

ceta

ble

srdquow

ith

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioA

BC

-LD

Ap

rio

rerr

orr

ates

wer

eco

mp

ute

du

sin

gre

fere

nce

tab

les

oft

wo

diff

eren

tsi

zes

ldquolar

gere

fere

nce

tab

lerdquo(

ie

500

000

sim

ula

ted

dat

aset

sp

ersc

enar

io)

and

ldquosm

allr

efer

ence

tab

lerdquo

(ie

10

000

sim

ula

ted

dat

aset

sp

ersc

enar

ioas

for

AB

C-R

Fan

alys

es)

SD

sta

nd

sfo

rst

and

ard

dev

iati

on

ove

r10

rep

licat

ean

alys

esan

dC

Ifo

r95

co

nfi

den

cein

terv

alco

mp

ute

dfo

llow

ing

Co

rnu

etet

al(

2008

)

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

985Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

small reference tables (paired t-test t10frac1427 Pfrac14 002)However ABC-RF performed better than ABC-LDA whenusing small reference tables (only 10000 simulations per sce-nario) having lower prior error rates (paired t-test t10frac14116Plt 00001 table 2 and supplementary table S3Supplementary Material online) The difference in prior errorrate was particularly marked for the complex analyses iethose in which many introduction scenarios were comparedand many summary statistics were computed For instancefor the analyses 5a and 5b (21 compared scenarios and 424summary statistics computed) prior error rates were 0217and 0221 for ABC-RF versus 0417 and 0425 for ABC-LDA

Regarding computational effort we found that for a givenobserved dataset an ABC-RF treatment to select the bestscenario and compute its posterior probability required a

ca 50 times shorter computational duration than when pro-cessing a ABC-LDA treatment with a traditional referencetable of large size (ie 500000 simulated datasets per sce-nario) Moreover estimation of prior error rates with ABC-RF took only a few additional minutes of computation timefor 10000 pseudo-observed datasets whereas with ABC-LDAit lasted several hours to several days (depending on theanalysis processed) for 500 pseudo-observed datasets

The same invasion scenario had the highest probability forall 11 analyses carried out using ABC-RF and ABC-LDA withlarge number of simulated datasets (table 2 and supplementary table S3 Supplementary Material online) Posterior prob-abilities of the best scenario estimated using ABC-RF were notsystematically higher (or lower) than those using ABC-LDAwith large number of simulated datasets In particular there

FIG 1 Worldwide invasion scenario of D suzukii inferred from microsatellite data and date of first observation Map and schematic showingsample sites and the invasion routes taken by D suzukii as reconstructed by ABC-RF (Pudlo et al 2016) on a total of 685 individuals from 23geographic locations genotyped at 25 microsatellite loci (see results and methods for details) The native range is in dark grey and the invasiverange is in light-gray (cf delimitation from fig 1 in Asplen et al 2015) The year in which D suzukii was first observed at each sample site is indicatedin italics The 23 geographical locations that were sampled are represented by circles (native range) and squares diamonds and triangles(introduced range) Squares indicate populations that experienced weak bottlenecks (ie median value of bottleneck severity lt 012 seemain text and table 3) diamonds indicate moderate bottlenecks (012 lt bottleneck severity lt 022) and triangles indicate strong bottlenecks(ie bottleneck severitygt 03) The colors of the symbol for the sample sites and arrows between them correspond to the different genetic groupsobtained using the clustering method BAPS (supplementary appendix S1 Supplementary Material online) The arrows indicate the most probableinvasion pathways A1ndashA5 indicate five separate admixture events between different sources O1ndashO3 indicate the most probable sources withinthe native range for the primary introduction events A1frac14 Hawaiithorn southeast China A2frac14Watsonville (western US)thorn Hawaii A3frac14 southernEurope thorn eastern US A4 frac14 western US thorn eastern US A5 frac14 southern Europe thorn northern Europe O1 frac14 Japan O2 frac14 southeast China O3 frac14northeast China

Fraimout et al doi101093molbevmsx050 MBE

986Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

was no clear trend of overestimation of posterior probabilitywhen using ABC-LDA with large number of simulated data-sets versus ABC-RF a potential bias suggested by Pudlo et al(2016) On the other hand we found evidence of instability inthe estimation of posterior probabilities and hence modelselection when using ABC-LDA with a small numbers of sim-ulated datasets in the reference table (ie 10000 simulationsper scenario as for ABC-RF) In this case the scenario with thehighest probability was different from that selected using ei-ther ABC-LDA with the large reference table or ABC-RF infour and two of the 11 analyses when using the prior sets 1and 2 respectively (results not shown)

Refining the Origins of the Primary IntroductionEvents from the Native AreaAdditional ABC-RF analyses were run to determine whichgeographical area in the native range was the most likelyorigin of each of the three primary introductions (ie inHawaii western US and Europe) The most probable originof the Hawaiian invasive population was Japan with a mean

posterior probability of 0959 (origin O1 in fig 1) ForWatsonville (western US sample site US-Wat) the mostprobable source of the primary introduction from Asia wassoutheast China (origin O2 in fig 1 Pfrac14 0860) Finally forEurope the most probable source of the primary introduc-tion from Asia was northeast China (origin O3 in fig 1Pfrac14 0855) Similar posterior probabilities were obtained usingthe prior set 1 (values presented above) and the prior set 2(results not shown) For inferences regarding Europe similarposterior probabilities were also obtained using various sets ofsample sites from Europe and the eastern US (results notshown)

Estimation of Parameter Distributions for AdmixtureRate and Bottleneck SeverityWe used the most probable worldwide invasion scenario (fig1) to estimate the posterior distributions of parameters re-lated to bottleneck severity associated to the foundation ofnew populations and genetic admixture rates between differ-entiated sources (ie A1ndashA5 events in fig 1)

FIG 2 Admixed or non-admixed origin of D suzukii in Europe inferred for each sample sites Potential source populations include (among others)the Asian native range (represented by the Japanese sample site JP-Tok) and the invasive genetic group from eastern US represented by one of thefour invasive sample sites collected in this area (fig 1) Four replicate independent ABC-RF treatments corresponding to the analysis 3b (table 2)were hence carried out for each targeted European sample site using one of the four eastern US sample site The treatments labeled 1 2 3 and 4 inthe pies of the figure have been carried out with the sample sites US-NC US-Wis US-Gen and US-Col respectively A pie quarter in blue indicatesthat the best scenario corresponds to a single introduction event from Asia A pie quarter in red indicates that the best scenario corresponds to anadmixture event between Asia and eastern US Dates in italic correspond to the dates of first record of the European sample sites

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

987Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

The posterior distributions parameters relating to bottle-necks and admixture substantially differed from the priorsindicating that genetic data were informative for such param-eters (tables 3 and 4 and see supplementary tables S6 and S7Supplementary Material online for results using an alternativeset of representative sample sites) Regarding bottleneck se-verity we found that bottlenecks tended to be more severefor populations founded by individuals originating from asource located on a different continent (extra-continentalorigin) than for populations founded by individuals originat-ing from a source located on the same continent (intra-con-tinental origin table 3 and fig 1) The bottleneck severity forpopulations founded by individuals corresponding to a com-bination of the two types of sources (extra- and intra-continental origins) was either weak or moderate The stron-gest bottleneck severity was found for the first invasive

population in Hawaii which showed the lowest level of ge-netic variation among all invasive populations Bottleneckseverity values were globally slightly stronger in Europeanthan in US populations suggesting a smaller number offounding individuals andor a slower demographic recoveryin European populations

Regarding admixture (table 4) we found that (i) for thefirst recorded invasive population in western US (ieWatsonville sample site US-Wat) the genetic contributionfrom China was substantially larger than that from Hawaii(median value of 0759) (ii) for the western US populationUS-Sok the secondary genetic contribution from Hawaii wassmall compared to that from Watsonville (median value of0237) and (iii) the sample sites from northern Europe (egGE-Dos in Germany) contained a rather low proportion ofgenes originating from the eastern US (median value of

Table 3 Bottleneck Severity in Invasive Populations of D suzukii

Prior 1 Prior 2Introduction Type Sample Site Mean Median Mode q5 q95 Mean Median Mode q5 q95

Extra continental US-Haw 0517 0500 0495 0326 0755 0490 0484 0477 0382 0615US-Wat 0181 0138 0114 0041 0448 0143 0127 0118 0059 0275IT-Tre 0268 0179 0171 0059 0837 0236 0189 0172 0101 0553BR-PA 0158 0126 0104 0020 0407 0208 0193 0172 0067 0399FR-Reu 0259 0215 0186 0051 0626 0245 0214 0190 0069 0534

Intra-continental US-SD 0135 0117 0106 0039 0283 0117 0104 0091 0053 0224US-NC 0089 0067 0061 0015 0230 0111 0098 0090 0046 0218

Extra thorn intracontinental

US-Sok 0116 0099 0088 0027 0250 0119 0106 0099 0052 0229GE-Dos 0189 0177 0155 0061 0356 0205 0189 0171 0088 0372

Prior values 0628 0199 NA 0021 1904 0517 0500 NA 0326 0754

NOTEmdash Extra-continental introductions correspond to a long distance introduction from a source located apart from the continent of the focal population intra-continentalintroduction corresponds to an introduction event from a source located on the same continent than the focal population and Extrathorn Intra continental introductioncorresponds to a combination of the two types of sources Mean median and mode estimates as well as bounds of 90 credibility intervals (q5 and q95) are indicated foreach bottleneck severity parameter We roughly classified the estimated bottleneck severity values into three classes (represented here by the three shades of gray) weak (iemedian value of bottleneck severitylt 012 in light gray) moderate (012lt bottleneck severitylt 022 in gray) and strong (ie bottleneck severitygt 03 in dark gray) The set ofsample sites used for the ABC estimations presented here include (i) for the native area Japan (JP-Tokthorn JP-Sap) South-East China (CN-Nin) and North-East China (CN-LanthornCN-Lia) and (ii) for the invaded range US-Wat US-Sok and US-SD for western US US-NC for eastern US IT-Tre for southern Europe GE-Dos for northern Europe BR-PAfor South-America (Brazil) and FR-Reu for La Reunion island Code names of the sample sites are the same as in fig 1 and supplementary table S1 Supplementary Materialonline in which bottleneck severity classes are also given for each sample site See supplementary table S6 Supplementary Material online for results on a different set ofrepresentative sample sites

Table 4 Posterior Distributions of Admixture Rates for the Five Admixture Events Inferred for the Final Worldwide Invasion Scenario Described infigure 1

Admixture Event Admixture Rate (gene fraction from pop x) Mean Median Mode q5 q95

Prior 1 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0759 0761 0774 0659 0854A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0237 0230 0227 0092 0400A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0286 0278 0266 0112 0481A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0467 0461 0442 0187 0754A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0388 0380 0426 0132 0670

Prior 2 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0761 0762 0766 0684 0833A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0224 0219 0199 0114 0350A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0312 0306 0284 0152 0487A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0422 0418 0429 0247 0613A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0370 0368 0356 0178 0569

NOTEmdash Admixture events are denoted as in figure 1 Each admixture rate parameter r points to the name of the admixed site and corresponds to the fraction of genesoriginating from the source site in parentheses (1 r genes originate from the other source site) Mean median and mode estimates as well as bounds of 90 credibilityintervals (q5 and q95) are indicated for each admixture parameter Estimations assuming the prior set 1 and the prior set 2 (supplementary table S8 Supplementary Materialonline) are provided The set of representative sample sites used for the ABC estimations presented here is the same than in table 3 Code names of the population sites are thesame as in fig 1 and supplementary table S1 Supplementary Material online See supplementary table S7 Supplementary Material online for results on a different set ofrepresentative sample sites

Fraimout et al doi101093molbevmsx050 MBE

988Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

0286) We found more balanced admixture rates for thepopulations from Brazil (BR-PA) and La Reunion (FR-Reu)but information on this parameter was less accurate in thesecases as indicated by large 90 credibility intervals

Model-Posterior CheckingLike any model-based methods ABC inferences do not revealthe ldquotruerdquo evolutionary history but allows choosing the bestamong a necessarily limited set of scenarios that have beencompared and to estimate posterior distributions of param-eters under this scenario How well the inferred scenario-posterior combination matches with the observed datasetremains to be evaluated using an ABC model-posterior check-ing analysis When applying such an analysis on the final in-vasion scenario detailed in figure 1 we found that whenconsidering the prior set 1 only 54 of the 1141 summarystatistics used as test quantities had low posterior predictiveP-values (ie 0002lt ppp-valueslt 5) Moreover none ofthose ppp-values values remained significant when correctingfor multiple comparisons (Benjamini and Hochberg 1995)Similar results were obtained when assuming the prior set 2and when considering different sets of representative samplesites in our analyses (results not shown) These findings showthat the final worldwide invasion scenario (fig 1 with associ-ated parameter posterior distributions) matches the observeddataset well In agreement with this the projections of thesimulated datasets on the principal component axes from thefinal model-posterior combination were well grouped andcentered on the target point corresponding to the observeddataset (supplementary fig S4 Supplementary Material on-line) Due to the modest size of the dataset (ie 25 microsat-ellite loci) however only situations of major inadequacy ofthe model-posterior combination to the observed dataset arelikely to be identified

DiscussionIn the present paper we decipher the routes taken byD suzukii in its invasion worldwide evaluating evidence ofbottlenecks and genetic admixture associated with differentintroductions and we quantify the efficiency of ABC-RF rel-ative to the more standard ABC-LDA method for choosingamong introduction scenarios

A Complex Worldwide Invasion HistoryPrior to this study and that of Adrion et al (2014) our un-derstanding of the worldwide introduction pathways ofD suzukii was based on historical and observational datawhich were incomplete and potentially misleading Our re-sults indicate three distinct introductions from the nativerangemdashto Hawaii the western side of North America andwestern Europe which accords well to findings of Adrionet al (2014) from a different set of markers (X-linked se-quence data) Both the western North American introduc-tions and at least some European populations show signs ofadmixture Hawaii and China both appear to have contrib-uted to the western North American introductions whichwas in turn the most probable source for the introductioninto eastern North America Similarly China and to a lesser

extent eastern North America both appear to have contrib-uted to the European introduction but mostly in northernEuropean areas with respect to the eastern North Americancontribution The almost simultaneous invasion of NorthAmerica and Europe from Asia could be due to increasedtrade between these areas facilitating transport of this speciesbetween regions Alternatively or in addition adaptation tohuman-altered habitats (in this case changes in agriculturalpractices) within the native range of the species could havepromoted invasion to similarly altered habitats worldwide(ie anthropogenicaly induced adaptation to invadeHufbauer et al 2012) However the two invaded continentswere most likely colonized by flies from distinct Chinese geo-graphic areas (fig 1) requiring that adaptation to agricultureoccurred concomitantly at several locations within the nativerange This is possible but not evolutionarily parsimoniousand further data would be required to test this explanationSubsequent introductions from the three primary founderlocations to other locations result in a complex history ofinvasion Both primary and secondary introductions showsigns of demographic bottlenecks and several invasive pop-ulations have multiples sources and thus experienced geneticadmixture between differentiated native or invasivepopulations

The genetic relationships between worldwide D suzukiipopulations define the genetic variation they harbor andmay continue to shape further spread of alleles among pop-ulations Gene flow among populations through continuousdispersal or more punctual admixture events permits thedissemination of allelic variants and new mutations withdramatic consequences for evolutionary trajectories(Lenormand 2002) We found evidence for asymmetric ge-netic admixture from North American towards (northern)European populations If this signal of admixture reflects re-current and on-going dispersal events rather than a singlepast secondary introduction event then variants that arisein North America would be more likely to spread to Europethan the reverse Discriminating between ongoing and pastdispersal events is tricky however especially for recent inva-sions (Benazzo et al 2015) If the occurrence of recurrent geneflow is confirmed then D suzukii populations from Europemay not lag behind American ones in evolutionary termsFrom an applied perspective if resistance to control practicesemerges in North America special efforts to stop the impor-tation of D suzukii to Europe would help prevent the evolu-tion of resistance in Europe and thus reduce damage to crops(Caprio and Tabashnik 1992)

Advances in Methods for Discriminating amongComplex ModelsTo the best of our knowledge the present study is the first touse the recently developed ABC-RF method (Pudlo et al2016) to test competing models characterized by differentlevels of complexity To compare ABC-RF to a more standardABC method we carried out a subset of model choice anal-yses using both ABC-RF and ABC-LDA (Estoup et al 2012)We ran ABC-LDA analyses in two ways with standard sizedlarge reference tables (500000 simulated datasets per

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

989Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

scenario) and with small reference tables (10000 simulateddatasets per scenario the same size as used for ABC-RF) Theperformance of ABC-LDA (in terms of prior error rate) inchoosing among introduction scenarios was slightly betterthan for ABC-RF when using large reference tablesFurthermore for ABC-LDA with large reference tables confi-dence in model choice (in terms of posterior probability) wasrelatively comparable to ABC-RF with small reference tablesThus when analyses are relatively simple or computationalresources are not limiting ABC-LDA can provide as robustinferences of invasion scenarios as ABC-RF

In contrast we found that ABC-RF out-performed ABC-LDA when using similarly small reference tables for both typesof analyses (10000 simulated datasets per scenario) ABC-LDA also was unstable choosing different scenarios in differ-ent replicate analyses when using small reference tablesThese empirical findings correspond well with those ofPudlo et al (2016) using simple models for which the trueposterior probabilities could be calculated Pudlo et al (2016)demonstrated that ABC-RF provided more reliable posteriorprobability of the best (true) scenario than standard ABCmethods when using the same (small) number of simulateddatasets in the reference table (see supplementary fig S3 inPudlo et al 2016) In our work the difference in performancebetween the two methods was most pronounced for com-plex analyses (eg when comparing more than 10 scenariosusing more than 100 summary statistics) Thus when analysesare not simple or computational resources are finite ABC-RFprovides more robust inferences of invasion scenarios thanABC-LDA

The lower computation cost of ABC-RF was also evidentFor a given observed dataset selecting the best supportedscenario and computing its posterior probability was ca 50times faster than when processing an ABC-LDA treatmentwith a traditional reference table of large size Moreover es-timation of prior error rates with ABC-RF took only a fewadditional minutes of computation time whereas it lastedseveral hours to several days with ABC-LDA The conse-quence of low computational costs is not simply in computertime Rather the efficiency of ABC-RF analyses allowed us torun replicate analyses on various sample sets even for themost complex D suzukii invasion scenarios This replication iscritical in evaluating the robustness of our statisticalinferences

We found that at least for some ABC-RF analyses theposterior probability of the best scenario and the global sta-tistical power to choose among alternative invasion scenarios(as measured by the prior error rate) were relatively low Forinstance in three of the 11 analyses (analyses 3b 5a and 5b)the best scenarios had relatively low probabilities (ie rangingfrom 0500 to 0630) and relatively high prior error rates (ieranging from 0300 to 0400) Although replicate analyses car-ried out on different sample sets and prior sets pointed to thesame best scenario for these three analyses such low proba-bility values and high prior error rates suggest a moderatelevel of confidence in the choice of model in these casesSimilarly we found that the scenario choice leading to theconclusion of the presence versus absence of admixed genes

from the eastern US in various European sample sites reliedon probability values that were sometimes as low as 0500Moreover in a minority of cases the best scenario was differ-ent depending on the sample site considered as representa-tive of the eastern US or the native area The observed patternof an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north tosouth gradient should hence be taken cautiously The anal-yses of additional European sample sites are needed to clarifythis issue The above inferential uncertainties also most likelyfind their sources in the relatively small number of markersgenotyped relative to the number complexity and similarityof some of the competing models Such results indicate thatthere is room for improving the robustness of our inferencesat least for a subset of our analyses

Bottlenecks and Genetic AdmixtureWe found strong evidence that bottlenecks and admixtureboth occurred during the course of the invasion of D suzukiiA first sign of bottlenecks is a reduction in neutral geneticdiversity We found that all invasive D suzukii populationswere characterized by significantly lower genetic variationthan native ones with a loss of 64 (US-Col) to 233(US-Haw) of heterozygosity and of 273 (US-Wat) to542 (US-Haw) of allelic diversity respectively (supplementary fig S5 Supplementary Material online) Adrion et al(2014) find a similar significant loss of diversity in their datasetfocused on gene sequences Dlugosch and Parker (2008) andUller and Leimu (2011) reviewed studies of neutral geneticdiversity in a large number of species of animals plants andfungi and compared nuclear molecular diversity within intro-duced and source populations Overall they found that a lossof variation was the most frequent feature in invasive popu-lations However reductions in genetic variation were on av-erage modest (eg average loss of 187 and 155 ofheterozygosity and allelic diversity respectively Dlugoschand Parker 2008) The reductions observed in our study ininvasive D suzukii populations were thus in the same rangefor heterozygosity but higher for allelic diversity (cf averageloss of 111 and 345 of heterozygosity and allelic diversityrespectively) A larger loss of allelic diversity than heterozy-gosity after a bottleneck is nevertheless expected by theory(Nei et al 1975)

The type of introduction pathway explains at least partlythe severity of the reduction in genetic diversity Bottleneckstended to be less severe for populations founded by individ-uals from the same continent than for populations foundedby individuals originating from a different continent A simpleexplanation for such pattern is that due to greater proximitypopulations originating from the same continent are likely tobe initially founded by a larger numbers of individuals andalso to experience recurrent gene flow Populations foundedby individuals from both the same and different continentsexperienced weak or moderate reductions in diversity sug-gesting that multiple introduction pathways restore diversity

In agreement with previous studies on invasions on simi-larly large geographical scales (eg Lombaert et al 2014 andreference therein) we found that admixture events were

Fraimout et al doi101093molbevmsx050 MBE

990Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

frequent in the worldwide invasion history of D suzukii Weidentified at least five admixture events for which sourceswere native andor invasive populations (ie A1ndashA5 eventsin fig 1) The genetic contributions of the sources were un-balanced except for the populations from Brazil and LaReunion but genetic information regarding admixture wasconsiderably less accurate in the latter cases which involvedweakly differentiated source populations

We have provided here the information necessary to eval-uate whether bottlenecks and admixture have influencedoutcomes in this invasion Quantitative genetics studies inthe lab focusing on the key source bottlenecked and ad-mixed populations could now examine the fitness and adap-tive potential of those groups (Lavergne and Molofsky 2007Facon et al 2008 Turgeon et al 2011)

Conclusions and PerspectivesUnderstanding invasion pathways provides key informationfor understanding the role of evolutionary processes in bio-logical invasions For example only does our research revealsources of invasive populations for further relevant compar-isons using quantitative genetics studies in the labAdditionally we found that the invasion of Europe by Dsuzukii was distinct from that of North America with limitedand asymmetrical gene flow between these two main invadedareas This situation provides the opportunity to evaluateevolutionary trajectories in replicate into two separate tem-perate climate regions similarly to research by Gilchrist et al(2001 2004) on the pace of clinal evolution in the invasivefruit fly Drosophila subobscura There are also distinct invasionpathways for areas with warmer climates (ie Hawaii LaReunion and Brazil) that will make interesting comparisons

Retracing invasion routes and making inferences aboutdemographic processes is only possible if there is adequatepolymorphism within populations and significant genetic dif-ferentiation among them As is evident here microsatellitedata can provide enough variation to reveal important path-ways We concur with Adrion et al (2014) however thatgenome-wide data (ie next generation sequencingapproaches) will be tremendously powerful in further dis-criminating among complex invasion scenarios (see alsoCristescu 2015 Estoup et al 2016) Because of the reducedcomputational resources demanded by ABC-RF this methodwill be particularly useful for analysis of massive single nucle-otide polymorphism datasets (Pudlo et al 2016) In additionto the selectively neutral demographic inference leading tothe reconstruction of routes of invasion population (andquantitative) next generation sequencing approaches arequite promising for studying the evolution of phenotypictraits in natural populations (Wray 2013 Gautier 2015)Specifically they can be used to better understand the geneticarchitecture of traits underlying invasion success (Bock et al2015) Drosophila suzukii is a good species for such researchgiven (i) its short generation time and viability in laboratoryconditions which facilitate experimental approaches to studyquantitative traits of interest (Asplen et al 2015) and (ii) theavailability of annotated genome assemblies for this species

(Chiu et al 2013 Ometto et al 2013) along with the hugeamount of genomic resources available in its close relativespecies D melanogaster (Groen and Whiteman 2016)

Materials and Methods

Sampling and GenotypingAdult D suzukii were sampled from the field at a total of 23localities (hereafter termed sample sites) distributed through-out most of the native and invasive range of the species(supplementary table S1 Supplementary Material onlineand fig 1) Samples were collected between 2013 and 2015using baited traps and sweep nets and stored in ethanolNative Asian samples consisted of a total of six sample sitesincluding four Chinese and two Japanese localities Samplesfrom the invasive range were collected in Hawaii (1 samplesite) Continental US (7 sites) Europe (7 sites) Brazil (1 site)and in the French island of La Reunion (1 site) For eachsample site 15ndash44 adult flies were genotyped at 25 of the28 microsatellite loci described in Fraimout et al (2015) re-sulting in a total of 685 genotyped individuals Three micro-satellite loci from Fraimout et al (2015) (ie loci DS31 DS42and DS45) were not included due to the presence of nullalleles at frequenciesgt 10 in some sample sites (resultsnot shown) DNA extraction and PCR amplification as wellas allele scoring were performed following the methodologydetailed in Fraimout et al (2015)

General Framework for Defining Focal PopulationGroups and Sets of Scenarios to Be ComparedUsing ABCOne of the many challenges of ABC analyses is to optimallydefine sets of sample sites to be processed chronologically aspotential sources and targets in a way that do not hamper thecomputational effort As stated in Estoup and Guillemaud(2010) and Lombaert et al (2014) the number and complex-ity of competing scenarios is proportional to the number ofgenetic groups to be accounted in ABC analyses FollowingLombaert et al (2014) we used the results obtained fromthree genetic clustering methods along with historical andgeographical information to define genetic groups that couldinclude several sample sites (supplementary appendix S1Supplementary Material online) We defined seven main ge-netic groups from our set of 23 sample sites Asia Hawaiiwestern US eastern US Europe Brazil and La Reunion Tomake the computations less intensive with respect to com-puter resources ABC treatments were carried out using onerepresentative sample site for each genetic group per replicateanalysis In other words when a genetic group included morethan two sample sites we considered the two most differen-tiated sample sites (ie with the highest FST values in supplementary table S3 Supplementary Material online) as repre-sentative of the group CN-Shi and JP-Tok for the group Asiaall three sample sites for the genetically heterogeneousWestern US group US-NC and US-Wis for the group easternUS GE-Dos and IT-Tre for the group Europe Each of thesample sites retained (ie the representative sample sites) wasincluded alternatively in the sample set considered for ABC

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

991Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 8: Deciphering the Routes of invasion of Drosophila suzukii

small reference tables (paired t-test t10frac1427 Pfrac14 002)However ABC-RF performed better than ABC-LDA whenusing small reference tables (only 10000 simulations per sce-nario) having lower prior error rates (paired t-test t10frac14116Plt 00001 table 2 and supplementary table S3Supplementary Material online) The difference in prior errorrate was particularly marked for the complex analyses iethose in which many introduction scenarios were comparedand many summary statistics were computed For instancefor the analyses 5a and 5b (21 compared scenarios and 424summary statistics computed) prior error rates were 0217and 0221 for ABC-RF versus 0417 and 0425 for ABC-LDA

Regarding computational effort we found that for a givenobserved dataset an ABC-RF treatment to select the bestscenario and compute its posterior probability required a

ca 50 times shorter computational duration than when pro-cessing a ABC-LDA treatment with a traditional referencetable of large size (ie 500000 simulated datasets per sce-nario) Moreover estimation of prior error rates with ABC-RF took only a few additional minutes of computation timefor 10000 pseudo-observed datasets whereas with ABC-LDAit lasted several hours to several days (depending on theanalysis processed) for 500 pseudo-observed datasets

The same invasion scenario had the highest probability forall 11 analyses carried out using ABC-RF and ABC-LDA withlarge number of simulated datasets (table 2 and supplementary table S3 Supplementary Material online) Posterior prob-abilities of the best scenario estimated using ABC-RF were notsystematically higher (or lower) than those using ABC-LDAwith large number of simulated datasets In particular there

FIG 1 Worldwide invasion scenario of D suzukii inferred from microsatellite data and date of first observation Map and schematic showingsample sites and the invasion routes taken by D suzukii as reconstructed by ABC-RF (Pudlo et al 2016) on a total of 685 individuals from 23geographic locations genotyped at 25 microsatellite loci (see results and methods for details) The native range is in dark grey and the invasiverange is in light-gray (cf delimitation from fig 1 in Asplen et al 2015) The year in which D suzukii was first observed at each sample site is indicatedin italics The 23 geographical locations that were sampled are represented by circles (native range) and squares diamonds and triangles(introduced range) Squares indicate populations that experienced weak bottlenecks (ie median value of bottleneck severity lt 012 seemain text and table 3) diamonds indicate moderate bottlenecks (012 lt bottleneck severity lt 022) and triangles indicate strong bottlenecks(ie bottleneck severitygt 03) The colors of the symbol for the sample sites and arrows between them correspond to the different genetic groupsobtained using the clustering method BAPS (supplementary appendix S1 Supplementary Material online) The arrows indicate the most probableinvasion pathways A1ndashA5 indicate five separate admixture events between different sources O1ndashO3 indicate the most probable sources withinthe native range for the primary introduction events A1frac14 Hawaiithorn southeast China A2frac14Watsonville (western US)thorn Hawaii A3frac14 southernEurope thorn eastern US A4 frac14 western US thorn eastern US A5 frac14 southern Europe thorn northern Europe O1 frac14 Japan O2 frac14 southeast China O3 frac14northeast China

Fraimout et al doi101093molbevmsx050 MBE

986Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

was no clear trend of overestimation of posterior probabilitywhen using ABC-LDA with large number of simulated data-sets versus ABC-RF a potential bias suggested by Pudlo et al(2016) On the other hand we found evidence of instability inthe estimation of posterior probabilities and hence modelselection when using ABC-LDA with a small numbers of sim-ulated datasets in the reference table (ie 10000 simulationsper scenario as for ABC-RF) In this case the scenario with thehighest probability was different from that selected using ei-ther ABC-LDA with the large reference table or ABC-RF infour and two of the 11 analyses when using the prior sets 1and 2 respectively (results not shown)

Refining the Origins of the Primary IntroductionEvents from the Native AreaAdditional ABC-RF analyses were run to determine whichgeographical area in the native range was the most likelyorigin of each of the three primary introductions (ie inHawaii western US and Europe) The most probable originof the Hawaiian invasive population was Japan with a mean

posterior probability of 0959 (origin O1 in fig 1) ForWatsonville (western US sample site US-Wat) the mostprobable source of the primary introduction from Asia wassoutheast China (origin O2 in fig 1 Pfrac14 0860) Finally forEurope the most probable source of the primary introduc-tion from Asia was northeast China (origin O3 in fig 1Pfrac14 0855) Similar posterior probabilities were obtained usingthe prior set 1 (values presented above) and the prior set 2(results not shown) For inferences regarding Europe similarposterior probabilities were also obtained using various sets ofsample sites from Europe and the eastern US (results notshown)

Estimation of Parameter Distributions for AdmixtureRate and Bottleneck SeverityWe used the most probable worldwide invasion scenario (fig1) to estimate the posterior distributions of parameters re-lated to bottleneck severity associated to the foundation ofnew populations and genetic admixture rates between differ-entiated sources (ie A1ndashA5 events in fig 1)

FIG 2 Admixed or non-admixed origin of D suzukii in Europe inferred for each sample sites Potential source populations include (among others)the Asian native range (represented by the Japanese sample site JP-Tok) and the invasive genetic group from eastern US represented by one of thefour invasive sample sites collected in this area (fig 1) Four replicate independent ABC-RF treatments corresponding to the analysis 3b (table 2)were hence carried out for each targeted European sample site using one of the four eastern US sample site The treatments labeled 1 2 3 and 4 inthe pies of the figure have been carried out with the sample sites US-NC US-Wis US-Gen and US-Col respectively A pie quarter in blue indicatesthat the best scenario corresponds to a single introduction event from Asia A pie quarter in red indicates that the best scenario corresponds to anadmixture event between Asia and eastern US Dates in italic correspond to the dates of first record of the European sample sites

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

987Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

The posterior distributions parameters relating to bottle-necks and admixture substantially differed from the priorsindicating that genetic data were informative for such param-eters (tables 3 and 4 and see supplementary tables S6 and S7Supplementary Material online for results using an alternativeset of representative sample sites) Regarding bottleneck se-verity we found that bottlenecks tended to be more severefor populations founded by individuals originating from asource located on a different continent (extra-continentalorigin) than for populations founded by individuals originat-ing from a source located on the same continent (intra-con-tinental origin table 3 and fig 1) The bottleneck severity forpopulations founded by individuals corresponding to a com-bination of the two types of sources (extra- and intra-continental origins) was either weak or moderate The stron-gest bottleneck severity was found for the first invasive

population in Hawaii which showed the lowest level of ge-netic variation among all invasive populations Bottleneckseverity values were globally slightly stronger in Europeanthan in US populations suggesting a smaller number offounding individuals andor a slower demographic recoveryin European populations

Regarding admixture (table 4) we found that (i) for thefirst recorded invasive population in western US (ieWatsonville sample site US-Wat) the genetic contributionfrom China was substantially larger than that from Hawaii(median value of 0759) (ii) for the western US populationUS-Sok the secondary genetic contribution from Hawaii wassmall compared to that from Watsonville (median value of0237) and (iii) the sample sites from northern Europe (egGE-Dos in Germany) contained a rather low proportion ofgenes originating from the eastern US (median value of

Table 3 Bottleneck Severity in Invasive Populations of D suzukii

Prior 1 Prior 2Introduction Type Sample Site Mean Median Mode q5 q95 Mean Median Mode q5 q95

Extra continental US-Haw 0517 0500 0495 0326 0755 0490 0484 0477 0382 0615US-Wat 0181 0138 0114 0041 0448 0143 0127 0118 0059 0275IT-Tre 0268 0179 0171 0059 0837 0236 0189 0172 0101 0553BR-PA 0158 0126 0104 0020 0407 0208 0193 0172 0067 0399FR-Reu 0259 0215 0186 0051 0626 0245 0214 0190 0069 0534

Intra-continental US-SD 0135 0117 0106 0039 0283 0117 0104 0091 0053 0224US-NC 0089 0067 0061 0015 0230 0111 0098 0090 0046 0218

Extra thorn intracontinental

US-Sok 0116 0099 0088 0027 0250 0119 0106 0099 0052 0229GE-Dos 0189 0177 0155 0061 0356 0205 0189 0171 0088 0372

Prior values 0628 0199 NA 0021 1904 0517 0500 NA 0326 0754

NOTEmdash Extra-continental introductions correspond to a long distance introduction from a source located apart from the continent of the focal population intra-continentalintroduction corresponds to an introduction event from a source located on the same continent than the focal population and Extrathorn Intra continental introductioncorresponds to a combination of the two types of sources Mean median and mode estimates as well as bounds of 90 credibility intervals (q5 and q95) are indicated foreach bottleneck severity parameter We roughly classified the estimated bottleneck severity values into three classes (represented here by the three shades of gray) weak (iemedian value of bottleneck severitylt 012 in light gray) moderate (012lt bottleneck severitylt 022 in gray) and strong (ie bottleneck severitygt 03 in dark gray) The set ofsample sites used for the ABC estimations presented here include (i) for the native area Japan (JP-Tokthorn JP-Sap) South-East China (CN-Nin) and North-East China (CN-LanthornCN-Lia) and (ii) for the invaded range US-Wat US-Sok and US-SD for western US US-NC for eastern US IT-Tre for southern Europe GE-Dos for northern Europe BR-PAfor South-America (Brazil) and FR-Reu for La Reunion island Code names of the sample sites are the same as in fig 1 and supplementary table S1 Supplementary Materialonline in which bottleneck severity classes are also given for each sample site See supplementary table S6 Supplementary Material online for results on a different set ofrepresentative sample sites

Table 4 Posterior Distributions of Admixture Rates for the Five Admixture Events Inferred for the Final Worldwide Invasion Scenario Described infigure 1

Admixture Event Admixture Rate (gene fraction from pop x) Mean Median Mode q5 q95

Prior 1 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0759 0761 0774 0659 0854A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0237 0230 0227 0092 0400A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0286 0278 0266 0112 0481A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0467 0461 0442 0187 0754A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0388 0380 0426 0132 0670

Prior 2 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0761 0762 0766 0684 0833A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0224 0219 0199 0114 0350A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0312 0306 0284 0152 0487A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0422 0418 0429 0247 0613A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0370 0368 0356 0178 0569

NOTEmdash Admixture events are denoted as in figure 1 Each admixture rate parameter r points to the name of the admixed site and corresponds to the fraction of genesoriginating from the source site in parentheses (1 r genes originate from the other source site) Mean median and mode estimates as well as bounds of 90 credibilityintervals (q5 and q95) are indicated for each admixture parameter Estimations assuming the prior set 1 and the prior set 2 (supplementary table S8 Supplementary Materialonline) are provided The set of representative sample sites used for the ABC estimations presented here is the same than in table 3 Code names of the population sites are thesame as in fig 1 and supplementary table S1 Supplementary Material online See supplementary table S7 Supplementary Material online for results on a different set ofrepresentative sample sites

Fraimout et al doi101093molbevmsx050 MBE

988Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

0286) We found more balanced admixture rates for thepopulations from Brazil (BR-PA) and La Reunion (FR-Reu)but information on this parameter was less accurate in thesecases as indicated by large 90 credibility intervals

Model-Posterior CheckingLike any model-based methods ABC inferences do not revealthe ldquotruerdquo evolutionary history but allows choosing the bestamong a necessarily limited set of scenarios that have beencompared and to estimate posterior distributions of param-eters under this scenario How well the inferred scenario-posterior combination matches with the observed datasetremains to be evaluated using an ABC model-posterior check-ing analysis When applying such an analysis on the final in-vasion scenario detailed in figure 1 we found that whenconsidering the prior set 1 only 54 of the 1141 summarystatistics used as test quantities had low posterior predictiveP-values (ie 0002lt ppp-valueslt 5) Moreover none ofthose ppp-values values remained significant when correctingfor multiple comparisons (Benjamini and Hochberg 1995)Similar results were obtained when assuming the prior set 2and when considering different sets of representative samplesites in our analyses (results not shown) These findings showthat the final worldwide invasion scenario (fig 1 with associ-ated parameter posterior distributions) matches the observeddataset well In agreement with this the projections of thesimulated datasets on the principal component axes from thefinal model-posterior combination were well grouped andcentered on the target point corresponding to the observeddataset (supplementary fig S4 Supplementary Material on-line) Due to the modest size of the dataset (ie 25 microsat-ellite loci) however only situations of major inadequacy ofthe model-posterior combination to the observed dataset arelikely to be identified

DiscussionIn the present paper we decipher the routes taken byD suzukii in its invasion worldwide evaluating evidence ofbottlenecks and genetic admixture associated with differentintroductions and we quantify the efficiency of ABC-RF rel-ative to the more standard ABC-LDA method for choosingamong introduction scenarios

A Complex Worldwide Invasion HistoryPrior to this study and that of Adrion et al (2014) our un-derstanding of the worldwide introduction pathways ofD suzukii was based on historical and observational datawhich were incomplete and potentially misleading Our re-sults indicate three distinct introductions from the nativerangemdashto Hawaii the western side of North America andwestern Europe which accords well to findings of Adrionet al (2014) from a different set of markers (X-linked se-quence data) Both the western North American introduc-tions and at least some European populations show signs ofadmixture Hawaii and China both appear to have contrib-uted to the western North American introductions whichwas in turn the most probable source for the introductioninto eastern North America Similarly China and to a lesser

extent eastern North America both appear to have contrib-uted to the European introduction but mostly in northernEuropean areas with respect to the eastern North Americancontribution The almost simultaneous invasion of NorthAmerica and Europe from Asia could be due to increasedtrade between these areas facilitating transport of this speciesbetween regions Alternatively or in addition adaptation tohuman-altered habitats (in this case changes in agriculturalpractices) within the native range of the species could havepromoted invasion to similarly altered habitats worldwide(ie anthropogenicaly induced adaptation to invadeHufbauer et al 2012) However the two invaded continentswere most likely colonized by flies from distinct Chinese geo-graphic areas (fig 1) requiring that adaptation to agricultureoccurred concomitantly at several locations within the nativerange This is possible but not evolutionarily parsimoniousand further data would be required to test this explanationSubsequent introductions from the three primary founderlocations to other locations result in a complex history ofinvasion Both primary and secondary introductions showsigns of demographic bottlenecks and several invasive pop-ulations have multiples sources and thus experienced geneticadmixture between differentiated native or invasivepopulations

The genetic relationships between worldwide D suzukiipopulations define the genetic variation they harbor andmay continue to shape further spread of alleles among pop-ulations Gene flow among populations through continuousdispersal or more punctual admixture events permits thedissemination of allelic variants and new mutations withdramatic consequences for evolutionary trajectories(Lenormand 2002) We found evidence for asymmetric ge-netic admixture from North American towards (northern)European populations If this signal of admixture reflects re-current and on-going dispersal events rather than a singlepast secondary introduction event then variants that arisein North America would be more likely to spread to Europethan the reverse Discriminating between ongoing and pastdispersal events is tricky however especially for recent inva-sions (Benazzo et al 2015) If the occurrence of recurrent geneflow is confirmed then D suzukii populations from Europemay not lag behind American ones in evolutionary termsFrom an applied perspective if resistance to control practicesemerges in North America special efforts to stop the impor-tation of D suzukii to Europe would help prevent the evolu-tion of resistance in Europe and thus reduce damage to crops(Caprio and Tabashnik 1992)

Advances in Methods for Discriminating amongComplex ModelsTo the best of our knowledge the present study is the first touse the recently developed ABC-RF method (Pudlo et al2016) to test competing models characterized by differentlevels of complexity To compare ABC-RF to a more standardABC method we carried out a subset of model choice anal-yses using both ABC-RF and ABC-LDA (Estoup et al 2012)We ran ABC-LDA analyses in two ways with standard sizedlarge reference tables (500000 simulated datasets per

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

989Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

scenario) and with small reference tables (10000 simulateddatasets per scenario the same size as used for ABC-RF) Theperformance of ABC-LDA (in terms of prior error rate) inchoosing among introduction scenarios was slightly betterthan for ABC-RF when using large reference tablesFurthermore for ABC-LDA with large reference tables confi-dence in model choice (in terms of posterior probability) wasrelatively comparable to ABC-RF with small reference tablesThus when analyses are relatively simple or computationalresources are not limiting ABC-LDA can provide as robustinferences of invasion scenarios as ABC-RF

In contrast we found that ABC-RF out-performed ABC-LDA when using similarly small reference tables for both typesof analyses (10000 simulated datasets per scenario) ABC-LDA also was unstable choosing different scenarios in differ-ent replicate analyses when using small reference tablesThese empirical findings correspond well with those ofPudlo et al (2016) using simple models for which the trueposterior probabilities could be calculated Pudlo et al (2016)demonstrated that ABC-RF provided more reliable posteriorprobability of the best (true) scenario than standard ABCmethods when using the same (small) number of simulateddatasets in the reference table (see supplementary fig S3 inPudlo et al 2016) In our work the difference in performancebetween the two methods was most pronounced for com-plex analyses (eg when comparing more than 10 scenariosusing more than 100 summary statistics) Thus when analysesare not simple or computational resources are finite ABC-RFprovides more robust inferences of invasion scenarios thanABC-LDA

The lower computation cost of ABC-RF was also evidentFor a given observed dataset selecting the best supportedscenario and computing its posterior probability was ca 50times faster than when processing an ABC-LDA treatmentwith a traditional reference table of large size Moreover es-timation of prior error rates with ABC-RF took only a fewadditional minutes of computation time whereas it lastedseveral hours to several days with ABC-LDA The conse-quence of low computational costs is not simply in computertime Rather the efficiency of ABC-RF analyses allowed us torun replicate analyses on various sample sets even for themost complex D suzukii invasion scenarios This replication iscritical in evaluating the robustness of our statisticalinferences

We found that at least for some ABC-RF analyses theposterior probability of the best scenario and the global sta-tistical power to choose among alternative invasion scenarios(as measured by the prior error rate) were relatively low Forinstance in three of the 11 analyses (analyses 3b 5a and 5b)the best scenarios had relatively low probabilities (ie rangingfrom 0500 to 0630) and relatively high prior error rates (ieranging from 0300 to 0400) Although replicate analyses car-ried out on different sample sets and prior sets pointed to thesame best scenario for these three analyses such low proba-bility values and high prior error rates suggest a moderatelevel of confidence in the choice of model in these casesSimilarly we found that the scenario choice leading to theconclusion of the presence versus absence of admixed genes

from the eastern US in various European sample sites reliedon probability values that were sometimes as low as 0500Moreover in a minority of cases the best scenario was differ-ent depending on the sample site considered as representa-tive of the eastern US or the native area The observed patternof an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north tosouth gradient should hence be taken cautiously The anal-yses of additional European sample sites are needed to clarifythis issue The above inferential uncertainties also most likelyfind their sources in the relatively small number of markersgenotyped relative to the number complexity and similarityof some of the competing models Such results indicate thatthere is room for improving the robustness of our inferencesat least for a subset of our analyses

Bottlenecks and Genetic AdmixtureWe found strong evidence that bottlenecks and admixtureboth occurred during the course of the invasion of D suzukiiA first sign of bottlenecks is a reduction in neutral geneticdiversity We found that all invasive D suzukii populationswere characterized by significantly lower genetic variationthan native ones with a loss of 64 (US-Col) to 233(US-Haw) of heterozygosity and of 273 (US-Wat) to542 (US-Haw) of allelic diversity respectively (supplementary fig S5 Supplementary Material online) Adrion et al(2014) find a similar significant loss of diversity in their datasetfocused on gene sequences Dlugosch and Parker (2008) andUller and Leimu (2011) reviewed studies of neutral geneticdiversity in a large number of species of animals plants andfungi and compared nuclear molecular diversity within intro-duced and source populations Overall they found that a lossof variation was the most frequent feature in invasive popu-lations However reductions in genetic variation were on av-erage modest (eg average loss of 187 and 155 ofheterozygosity and allelic diversity respectively Dlugoschand Parker 2008) The reductions observed in our study ininvasive D suzukii populations were thus in the same rangefor heterozygosity but higher for allelic diversity (cf averageloss of 111 and 345 of heterozygosity and allelic diversityrespectively) A larger loss of allelic diversity than heterozy-gosity after a bottleneck is nevertheless expected by theory(Nei et al 1975)

The type of introduction pathway explains at least partlythe severity of the reduction in genetic diversity Bottleneckstended to be less severe for populations founded by individ-uals from the same continent than for populations foundedby individuals originating from a different continent A simpleexplanation for such pattern is that due to greater proximitypopulations originating from the same continent are likely tobe initially founded by a larger numbers of individuals andalso to experience recurrent gene flow Populations foundedby individuals from both the same and different continentsexperienced weak or moderate reductions in diversity sug-gesting that multiple introduction pathways restore diversity

In agreement with previous studies on invasions on simi-larly large geographical scales (eg Lombaert et al 2014 andreference therein) we found that admixture events were

Fraimout et al doi101093molbevmsx050 MBE

990Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

frequent in the worldwide invasion history of D suzukii Weidentified at least five admixture events for which sourceswere native andor invasive populations (ie A1ndashA5 eventsin fig 1) The genetic contributions of the sources were un-balanced except for the populations from Brazil and LaReunion but genetic information regarding admixture wasconsiderably less accurate in the latter cases which involvedweakly differentiated source populations

We have provided here the information necessary to eval-uate whether bottlenecks and admixture have influencedoutcomes in this invasion Quantitative genetics studies inthe lab focusing on the key source bottlenecked and ad-mixed populations could now examine the fitness and adap-tive potential of those groups (Lavergne and Molofsky 2007Facon et al 2008 Turgeon et al 2011)

Conclusions and PerspectivesUnderstanding invasion pathways provides key informationfor understanding the role of evolutionary processes in bio-logical invasions For example only does our research revealsources of invasive populations for further relevant compar-isons using quantitative genetics studies in the labAdditionally we found that the invasion of Europe by Dsuzukii was distinct from that of North America with limitedand asymmetrical gene flow between these two main invadedareas This situation provides the opportunity to evaluateevolutionary trajectories in replicate into two separate tem-perate climate regions similarly to research by Gilchrist et al(2001 2004) on the pace of clinal evolution in the invasivefruit fly Drosophila subobscura There are also distinct invasionpathways for areas with warmer climates (ie Hawaii LaReunion and Brazil) that will make interesting comparisons

Retracing invasion routes and making inferences aboutdemographic processes is only possible if there is adequatepolymorphism within populations and significant genetic dif-ferentiation among them As is evident here microsatellitedata can provide enough variation to reveal important path-ways We concur with Adrion et al (2014) however thatgenome-wide data (ie next generation sequencingapproaches) will be tremendously powerful in further dis-criminating among complex invasion scenarios (see alsoCristescu 2015 Estoup et al 2016) Because of the reducedcomputational resources demanded by ABC-RF this methodwill be particularly useful for analysis of massive single nucle-otide polymorphism datasets (Pudlo et al 2016) In additionto the selectively neutral demographic inference leading tothe reconstruction of routes of invasion population (andquantitative) next generation sequencing approaches arequite promising for studying the evolution of phenotypictraits in natural populations (Wray 2013 Gautier 2015)Specifically they can be used to better understand the geneticarchitecture of traits underlying invasion success (Bock et al2015) Drosophila suzukii is a good species for such researchgiven (i) its short generation time and viability in laboratoryconditions which facilitate experimental approaches to studyquantitative traits of interest (Asplen et al 2015) and (ii) theavailability of annotated genome assemblies for this species

(Chiu et al 2013 Ometto et al 2013) along with the hugeamount of genomic resources available in its close relativespecies D melanogaster (Groen and Whiteman 2016)

Materials and Methods

Sampling and GenotypingAdult D suzukii were sampled from the field at a total of 23localities (hereafter termed sample sites) distributed through-out most of the native and invasive range of the species(supplementary table S1 Supplementary Material onlineand fig 1) Samples were collected between 2013 and 2015using baited traps and sweep nets and stored in ethanolNative Asian samples consisted of a total of six sample sitesincluding four Chinese and two Japanese localities Samplesfrom the invasive range were collected in Hawaii (1 samplesite) Continental US (7 sites) Europe (7 sites) Brazil (1 site)and in the French island of La Reunion (1 site) For eachsample site 15ndash44 adult flies were genotyped at 25 of the28 microsatellite loci described in Fraimout et al (2015) re-sulting in a total of 685 genotyped individuals Three micro-satellite loci from Fraimout et al (2015) (ie loci DS31 DS42and DS45) were not included due to the presence of nullalleles at frequenciesgt 10 in some sample sites (resultsnot shown) DNA extraction and PCR amplification as wellas allele scoring were performed following the methodologydetailed in Fraimout et al (2015)

General Framework for Defining Focal PopulationGroups and Sets of Scenarios to Be ComparedUsing ABCOne of the many challenges of ABC analyses is to optimallydefine sets of sample sites to be processed chronologically aspotential sources and targets in a way that do not hamper thecomputational effort As stated in Estoup and Guillemaud(2010) and Lombaert et al (2014) the number and complex-ity of competing scenarios is proportional to the number ofgenetic groups to be accounted in ABC analyses FollowingLombaert et al (2014) we used the results obtained fromthree genetic clustering methods along with historical andgeographical information to define genetic groups that couldinclude several sample sites (supplementary appendix S1Supplementary Material online) We defined seven main ge-netic groups from our set of 23 sample sites Asia Hawaiiwestern US eastern US Europe Brazil and La Reunion Tomake the computations less intensive with respect to com-puter resources ABC treatments were carried out using onerepresentative sample site for each genetic group per replicateanalysis In other words when a genetic group included morethan two sample sites we considered the two most differen-tiated sample sites (ie with the highest FST values in supplementary table S3 Supplementary Material online) as repre-sentative of the group CN-Shi and JP-Tok for the group Asiaall three sample sites for the genetically heterogeneousWestern US group US-NC and US-Wis for the group easternUS GE-Dos and IT-Tre for the group Europe Each of thesample sites retained (ie the representative sample sites) wasincluded alternatively in the sample set considered for ABC

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

991Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 9: Deciphering the Routes of invasion of Drosophila suzukii

was no clear trend of overestimation of posterior probabilitywhen using ABC-LDA with large number of simulated data-sets versus ABC-RF a potential bias suggested by Pudlo et al(2016) On the other hand we found evidence of instability inthe estimation of posterior probabilities and hence modelselection when using ABC-LDA with a small numbers of sim-ulated datasets in the reference table (ie 10000 simulationsper scenario as for ABC-RF) In this case the scenario with thehighest probability was different from that selected using ei-ther ABC-LDA with the large reference table or ABC-RF infour and two of the 11 analyses when using the prior sets 1and 2 respectively (results not shown)

Refining the Origins of the Primary IntroductionEvents from the Native AreaAdditional ABC-RF analyses were run to determine whichgeographical area in the native range was the most likelyorigin of each of the three primary introductions (ie inHawaii western US and Europe) The most probable originof the Hawaiian invasive population was Japan with a mean

posterior probability of 0959 (origin O1 in fig 1) ForWatsonville (western US sample site US-Wat) the mostprobable source of the primary introduction from Asia wassoutheast China (origin O2 in fig 1 Pfrac14 0860) Finally forEurope the most probable source of the primary introduc-tion from Asia was northeast China (origin O3 in fig 1Pfrac14 0855) Similar posterior probabilities were obtained usingthe prior set 1 (values presented above) and the prior set 2(results not shown) For inferences regarding Europe similarposterior probabilities were also obtained using various sets ofsample sites from Europe and the eastern US (results notshown)

Estimation of Parameter Distributions for AdmixtureRate and Bottleneck SeverityWe used the most probable worldwide invasion scenario (fig1) to estimate the posterior distributions of parameters re-lated to bottleneck severity associated to the foundation ofnew populations and genetic admixture rates between differ-entiated sources (ie A1ndashA5 events in fig 1)

FIG 2 Admixed or non-admixed origin of D suzukii in Europe inferred for each sample sites Potential source populations include (among others)the Asian native range (represented by the Japanese sample site JP-Tok) and the invasive genetic group from eastern US represented by one of thefour invasive sample sites collected in this area (fig 1) Four replicate independent ABC-RF treatments corresponding to the analysis 3b (table 2)were hence carried out for each targeted European sample site using one of the four eastern US sample site The treatments labeled 1 2 3 and 4 inthe pies of the figure have been carried out with the sample sites US-NC US-Wis US-Gen and US-Col respectively A pie quarter in blue indicatesthat the best scenario corresponds to a single introduction event from Asia A pie quarter in red indicates that the best scenario corresponds to anadmixture event between Asia and eastern US Dates in italic correspond to the dates of first record of the European sample sites

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

987Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

The posterior distributions parameters relating to bottle-necks and admixture substantially differed from the priorsindicating that genetic data were informative for such param-eters (tables 3 and 4 and see supplementary tables S6 and S7Supplementary Material online for results using an alternativeset of representative sample sites) Regarding bottleneck se-verity we found that bottlenecks tended to be more severefor populations founded by individuals originating from asource located on a different continent (extra-continentalorigin) than for populations founded by individuals originat-ing from a source located on the same continent (intra-con-tinental origin table 3 and fig 1) The bottleneck severity forpopulations founded by individuals corresponding to a com-bination of the two types of sources (extra- and intra-continental origins) was either weak or moderate The stron-gest bottleneck severity was found for the first invasive

population in Hawaii which showed the lowest level of ge-netic variation among all invasive populations Bottleneckseverity values were globally slightly stronger in Europeanthan in US populations suggesting a smaller number offounding individuals andor a slower demographic recoveryin European populations

Regarding admixture (table 4) we found that (i) for thefirst recorded invasive population in western US (ieWatsonville sample site US-Wat) the genetic contributionfrom China was substantially larger than that from Hawaii(median value of 0759) (ii) for the western US populationUS-Sok the secondary genetic contribution from Hawaii wassmall compared to that from Watsonville (median value of0237) and (iii) the sample sites from northern Europe (egGE-Dos in Germany) contained a rather low proportion ofgenes originating from the eastern US (median value of

Table 3 Bottleneck Severity in Invasive Populations of D suzukii

Prior 1 Prior 2Introduction Type Sample Site Mean Median Mode q5 q95 Mean Median Mode q5 q95

Extra continental US-Haw 0517 0500 0495 0326 0755 0490 0484 0477 0382 0615US-Wat 0181 0138 0114 0041 0448 0143 0127 0118 0059 0275IT-Tre 0268 0179 0171 0059 0837 0236 0189 0172 0101 0553BR-PA 0158 0126 0104 0020 0407 0208 0193 0172 0067 0399FR-Reu 0259 0215 0186 0051 0626 0245 0214 0190 0069 0534

Intra-continental US-SD 0135 0117 0106 0039 0283 0117 0104 0091 0053 0224US-NC 0089 0067 0061 0015 0230 0111 0098 0090 0046 0218

Extra thorn intracontinental

US-Sok 0116 0099 0088 0027 0250 0119 0106 0099 0052 0229GE-Dos 0189 0177 0155 0061 0356 0205 0189 0171 0088 0372

Prior values 0628 0199 NA 0021 1904 0517 0500 NA 0326 0754

NOTEmdash Extra-continental introductions correspond to a long distance introduction from a source located apart from the continent of the focal population intra-continentalintroduction corresponds to an introduction event from a source located on the same continent than the focal population and Extrathorn Intra continental introductioncorresponds to a combination of the two types of sources Mean median and mode estimates as well as bounds of 90 credibility intervals (q5 and q95) are indicated foreach bottleneck severity parameter We roughly classified the estimated bottleneck severity values into three classes (represented here by the three shades of gray) weak (iemedian value of bottleneck severitylt 012 in light gray) moderate (012lt bottleneck severitylt 022 in gray) and strong (ie bottleneck severitygt 03 in dark gray) The set ofsample sites used for the ABC estimations presented here include (i) for the native area Japan (JP-Tokthorn JP-Sap) South-East China (CN-Nin) and North-East China (CN-LanthornCN-Lia) and (ii) for the invaded range US-Wat US-Sok and US-SD for western US US-NC for eastern US IT-Tre for southern Europe GE-Dos for northern Europe BR-PAfor South-America (Brazil) and FR-Reu for La Reunion island Code names of the sample sites are the same as in fig 1 and supplementary table S1 Supplementary Materialonline in which bottleneck severity classes are also given for each sample site See supplementary table S6 Supplementary Material online for results on a different set ofrepresentative sample sites

Table 4 Posterior Distributions of Admixture Rates for the Five Admixture Events Inferred for the Final Worldwide Invasion Scenario Described infigure 1

Admixture Event Admixture Rate (gene fraction from pop x) Mean Median Mode q5 q95

Prior 1 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0759 0761 0774 0659 0854A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0237 0230 0227 0092 0400A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0286 0278 0266 0112 0481A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0467 0461 0442 0187 0754A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0388 0380 0426 0132 0670

Prior 2 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0761 0762 0766 0684 0833A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0224 0219 0199 0114 0350A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0312 0306 0284 0152 0487A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0422 0418 0429 0247 0613A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0370 0368 0356 0178 0569

NOTEmdash Admixture events are denoted as in figure 1 Each admixture rate parameter r points to the name of the admixed site and corresponds to the fraction of genesoriginating from the source site in parentheses (1 r genes originate from the other source site) Mean median and mode estimates as well as bounds of 90 credibilityintervals (q5 and q95) are indicated for each admixture parameter Estimations assuming the prior set 1 and the prior set 2 (supplementary table S8 Supplementary Materialonline) are provided The set of representative sample sites used for the ABC estimations presented here is the same than in table 3 Code names of the population sites are thesame as in fig 1 and supplementary table S1 Supplementary Material online See supplementary table S7 Supplementary Material online for results on a different set ofrepresentative sample sites

Fraimout et al doi101093molbevmsx050 MBE

988Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

0286) We found more balanced admixture rates for thepopulations from Brazil (BR-PA) and La Reunion (FR-Reu)but information on this parameter was less accurate in thesecases as indicated by large 90 credibility intervals

Model-Posterior CheckingLike any model-based methods ABC inferences do not revealthe ldquotruerdquo evolutionary history but allows choosing the bestamong a necessarily limited set of scenarios that have beencompared and to estimate posterior distributions of param-eters under this scenario How well the inferred scenario-posterior combination matches with the observed datasetremains to be evaluated using an ABC model-posterior check-ing analysis When applying such an analysis on the final in-vasion scenario detailed in figure 1 we found that whenconsidering the prior set 1 only 54 of the 1141 summarystatistics used as test quantities had low posterior predictiveP-values (ie 0002lt ppp-valueslt 5) Moreover none ofthose ppp-values values remained significant when correctingfor multiple comparisons (Benjamini and Hochberg 1995)Similar results were obtained when assuming the prior set 2and when considering different sets of representative samplesites in our analyses (results not shown) These findings showthat the final worldwide invasion scenario (fig 1 with associ-ated parameter posterior distributions) matches the observeddataset well In agreement with this the projections of thesimulated datasets on the principal component axes from thefinal model-posterior combination were well grouped andcentered on the target point corresponding to the observeddataset (supplementary fig S4 Supplementary Material on-line) Due to the modest size of the dataset (ie 25 microsat-ellite loci) however only situations of major inadequacy ofthe model-posterior combination to the observed dataset arelikely to be identified

DiscussionIn the present paper we decipher the routes taken byD suzukii in its invasion worldwide evaluating evidence ofbottlenecks and genetic admixture associated with differentintroductions and we quantify the efficiency of ABC-RF rel-ative to the more standard ABC-LDA method for choosingamong introduction scenarios

A Complex Worldwide Invasion HistoryPrior to this study and that of Adrion et al (2014) our un-derstanding of the worldwide introduction pathways ofD suzukii was based on historical and observational datawhich were incomplete and potentially misleading Our re-sults indicate three distinct introductions from the nativerangemdashto Hawaii the western side of North America andwestern Europe which accords well to findings of Adrionet al (2014) from a different set of markers (X-linked se-quence data) Both the western North American introduc-tions and at least some European populations show signs ofadmixture Hawaii and China both appear to have contrib-uted to the western North American introductions whichwas in turn the most probable source for the introductioninto eastern North America Similarly China and to a lesser

extent eastern North America both appear to have contrib-uted to the European introduction but mostly in northernEuropean areas with respect to the eastern North Americancontribution The almost simultaneous invasion of NorthAmerica and Europe from Asia could be due to increasedtrade between these areas facilitating transport of this speciesbetween regions Alternatively or in addition adaptation tohuman-altered habitats (in this case changes in agriculturalpractices) within the native range of the species could havepromoted invasion to similarly altered habitats worldwide(ie anthropogenicaly induced adaptation to invadeHufbauer et al 2012) However the two invaded continentswere most likely colonized by flies from distinct Chinese geo-graphic areas (fig 1) requiring that adaptation to agricultureoccurred concomitantly at several locations within the nativerange This is possible but not evolutionarily parsimoniousand further data would be required to test this explanationSubsequent introductions from the three primary founderlocations to other locations result in a complex history ofinvasion Both primary and secondary introductions showsigns of demographic bottlenecks and several invasive pop-ulations have multiples sources and thus experienced geneticadmixture between differentiated native or invasivepopulations

The genetic relationships between worldwide D suzukiipopulations define the genetic variation they harbor andmay continue to shape further spread of alleles among pop-ulations Gene flow among populations through continuousdispersal or more punctual admixture events permits thedissemination of allelic variants and new mutations withdramatic consequences for evolutionary trajectories(Lenormand 2002) We found evidence for asymmetric ge-netic admixture from North American towards (northern)European populations If this signal of admixture reflects re-current and on-going dispersal events rather than a singlepast secondary introduction event then variants that arisein North America would be more likely to spread to Europethan the reverse Discriminating between ongoing and pastdispersal events is tricky however especially for recent inva-sions (Benazzo et al 2015) If the occurrence of recurrent geneflow is confirmed then D suzukii populations from Europemay not lag behind American ones in evolutionary termsFrom an applied perspective if resistance to control practicesemerges in North America special efforts to stop the impor-tation of D suzukii to Europe would help prevent the evolu-tion of resistance in Europe and thus reduce damage to crops(Caprio and Tabashnik 1992)

Advances in Methods for Discriminating amongComplex ModelsTo the best of our knowledge the present study is the first touse the recently developed ABC-RF method (Pudlo et al2016) to test competing models characterized by differentlevels of complexity To compare ABC-RF to a more standardABC method we carried out a subset of model choice anal-yses using both ABC-RF and ABC-LDA (Estoup et al 2012)We ran ABC-LDA analyses in two ways with standard sizedlarge reference tables (500000 simulated datasets per

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

989Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

scenario) and with small reference tables (10000 simulateddatasets per scenario the same size as used for ABC-RF) Theperformance of ABC-LDA (in terms of prior error rate) inchoosing among introduction scenarios was slightly betterthan for ABC-RF when using large reference tablesFurthermore for ABC-LDA with large reference tables confi-dence in model choice (in terms of posterior probability) wasrelatively comparable to ABC-RF with small reference tablesThus when analyses are relatively simple or computationalresources are not limiting ABC-LDA can provide as robustinferences of invasion scenarios as ABC-RF

In contrast we found that ABC-RF out-performed ABC-LDA when using similarly small reference tables for both typesof analyses (10000 simulated datasets per scenario) ABC-LDA also was unstable choosing different scenarios in differ-ent replicate analyses when using small reference tablesThese empirical findings correspond well with those ofPudlo et al (2016) using simple models for which the trueposterior probabilities could be calculated Pudlo et al (2016)demonstrated that ABC-RF provided more reliable posteriorprobability of the best (true) scenario than standard ABCmethods when using the same (small) number of simulateddatasets in the reference table (see supplementary fig S3 inPudlo et al 2016) In our work the difference in performancebetween the two methods was most pronounced for com-plex analyses (eg when comparing more than 10 scenariosusing more than 100 summary statistics) Thus when analysesare not simple or computational resources are finite ABC-RFprovides more robust inferences of invasion scenarios thanABC-LDA

The lower computation cost of ABC-RF was also evidentFor a given observed dataset selecting the best supportedscenario and computing its posterior probability was ca 50times faster than when processing an ABC-LDA treatmentwith a traditional reference table of large size Moreover es-timation of prior error rates with ABC-RF took only a fewadditional minutes of computation time whereas it lastedseveral hours to several days with ABC-LDA The conse-quence of low computational costs is not simply in computertime Rather the efficiency of ABC-RF analyses allowed us torun replicate analyses on various sample sets even for themost complex D suzukii invasion scenarios This replication iscritical in evaluating the robustness of our statisticalinferences

We found that at least for some ABC-RF analyses theposterior probability of the best scenario and the global sta-tistical power to choose among alternative invasion scenarios(as measured by the prior error rate) were relatively low Forinstance in three of the 11 analyses (analyses 3b 5a and 5b)the best scenarios had relatively low probabilities (ie rangingfrom 0500 to 0630) and relatively high prior error rates (ieranging from 0300 to 0400) Although replicate analyses car-ried out on different sample sets and prior sets pointed to thesame best scenario for these three analyses such low proba-bility values and high prior error rates suggest a moderatelevel of confidence in the choice of model in these casesSimilarly we found that the scenario choice leading to theconclusion of the presence versus absence of admixed genes

from the eastern US in various European sample sites reliedon probability values that were sometimes as low as 0500Moreover in a minority of cases the best scenario was differ-ent depending on the sample site considered as representa-tive of the eastern US or the native area The observed patternof an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north tosouth gradient should hence be taken cautiously The anal-yses of additional European sample sites are needed to clarifythis issue The above inferential uncertainties also most likelyfind their sources in the relatively small number of markersgenotyped relative to the number complexity and similarityof some of the competing models Such results indicate thatthere is room for improving the robustness of our inferencesat least for a subset of our analyses

Bottlenecks and Genetic AdmixtureWe found strong evidence that bottlenecks and admixtureboth occurred during the course of the invasion of D suzukiiA first sign of bottlenecks is a reduction in neutral geneticdiversity We found that all invasive D suzukii populationswere characterized by significantly lower genetic variationthan native ones with a loss of 64 (US-Col) to 233(US-Haw) of heterozygosity and of 273 (US-Wat) to542 (US-Haw) of allelic diversity respectively (supplementary fig S5 Supplementary Material online) Adrion et al(2014) find a similar significant loss of diversity in their datasetfocused on gene sequences Dlugosch and Parker (2008) andUller and Leimu (2011) reviewed studies of neutral geneticdiversity in a large number of species of animals plants andfungi and compared nuclear molecular diversity within intro-duced and source populations Overall they found that a lossof variation was the most frequent feature in invasive popu-lations However reductions in genetic variation were on av-erage modest (eg average loss of 187 and 155 ofheterozygosity and allelic diversity respectively Dlugoschand Parker 2008) The reductions observed in our study ininvasive D suzukii populations were thus in the same rangefor heterozygosity but higher for allelic diversity (cf averageloss of 111 and 345 of heterozygosity and allelic diversityrespectively) A larger loss of allelic diversity than heterozy-gosity after a bottleneck is nevertheless expected by theory(Nei et al 1975)

The type of introduction pathway explains at least partlythe severity of the reduction in genetic diversity Bottleneckstended to be less severe for populations founded by individ-uals from the same continent than for populations foundedby individuals originating from a different continent A simpleexplanation for such pattern is that due to greater proximitypopulations originating from the same continent are likely tobe initially founded by a larger numbers of individuals andalso to experience recurrent gene flow Populations foundedby individuals from both the same and different continentsexperienced weak or moderate reductions in diversity sug-gesting that multiple introduction pathways restore diversity

In agreement with previous studies on invasions on simi-larly large geographical scales (eg Lombaert et al 2014 andreference therein) we found that admixture events were

Fraimout et al doi101093molbevmsx050 MBE

990Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

frequent in the worldwide invasion history of D suzukii Weidentified at least five admixture events for which sourceswere native andor invasive populations (ie A1ndashA5 eventsin fig 1) The genetic contributions of the sources were un-balanced except for the populations from Brazil and LaReunion but genetic information regarding admixture wasconsiderably less accurate in the latter cases which involvedweakly differentiated source populations

We have provided here the information necessary to eval-uate whether bottlenecks and admixture have influencedoutcomes in this invasion Quantitative genetics studies inthe lab focusing on the key source bottlenecked and ad-mixed populations could now examine the fitness and adap-tive potential of those groups (Lavergne and Molofsky 2007Facon et al 2008 Turgeon et al 2011)

Conclusions and PerspectivesUnderstanding invasion pathways provides key informationfor understanding the role of evolutionary processes in bio-logical invasions For example only does our research revealsources of invasive populations for further relevant compar-isons using quantitative genetics studies in the labAdditionally we found that the invasion of Europe by Dsuzukii was distinct from that of North America with limitedand asymmetrical gene flow between these two main invadedareas This situation provides the opportunity to evaluateevolutionary trajectories in replicate into two separate tem-perate climate regions similarly to research by Gilchrist et al(2001 2004) on the pace of clinal evolution in the invasivefruit fly Drosophila subobscura There are also distinct invasionpathways for areas with warmer climates (ie Hawaii LaReunion and Brazil) that will make interesting comparisons

Retracing invasion routes and making inferences aboutdemographic processes is only possible if there is adequatepolymorphism within populations and significant genetic dif-ferentiation among them As is evident here microsatellitedata can provide enough variation to reveal important path-ways We concur with Adrion et al (2014) however thatgenome-wide data (ie next generation sequencingapproaches) will be tremendously powerful in further dis-criminating among complex invasion scenarios (see alsoCristescu 2015 Estoup et al 2016) Because of the reducedcomputational resources demanded by ABC-RF this methodwill be particularly useful for analysis of massive single nucle-otide polymorphism datasets (Pudlo et al 2016) In additionto the selectively neutral demographic inference leading tothe reconstruction of routes of invasion population (andquantitative) next generation sequencing approaches arequite promising for studying the evolution of phenotypictraits in natural populations (Wray 2013 Gautier 2015)Specifically they can be used to better understand the geneticarchitecture of traits underlying invasion success (Bock et al2015) Drosophila suzukii is a good species for such researchgiven (i) its short generation time and viability in laboratoryconditions which facilitate experimental approaches to studyquantitative traits of interest (Asplen et al 2015) and (ii) theavailability of annotated genome assemblies for this species

(Chiu et al 2013 Ometto et al 2013) along with the hugeamount of genomic resources available in its close relativespecies D melanogaster (Groen and Whiteman 2016)

Materials and Methods

Sampling and GenotypingAdult D suzukii were sampled from the field at a total of 23localities (hereafter termed sample sites) distributed through-out most of the native and invasive range of the species(supplementary table S1 Supplementary Material onlineand fig 1) Samples were collected between 2013 and 2015using baited traps and sweep nets and stored in ethanolNative Asian samples consisted of a total of six sample sitesincluding four Chinese and two Japanese localities Samplesfrom the invasive range were collected in Hawaii (1 samplesite) Continental US (7 sites) Europe (7 sites) Brazil (1 site)and in the French island of La Reunion (1 site) For eachsample site 15ndash44 adult flies were genotyped at 25 of the28 microsatellite loci described in Fraimout et al (2015) re-sulting in a total of 685 genotyped individuals Three micro-satellite loci from Fraimout et al (2015) (ie loci DS31 DS42and DS45) were not included due to the presence of nullalleles at frequenciesgt 10 in some sample sites (resultsnot shown) DNA extraction and PCR amplification as wellas allele scoring were performed following the methodologydetailed in Fraimout et al (2015)

General Framework for Defining Focal PopulationGroups and Sets of Scenarios to Be ComparedUsing ABCOne of the many challenges of ABC analyses is to optimallydefine sets of sample sites to be processed chronologically aspotential sources and targets in a way that do not hamper thecomputational effort As stated in Estoup and Guillemaud(2010) and Lombaert et al (2014) the number and complex-ity of competing scenarios is proportional to the number ofgenetic groups to be accounted in ABC analyses FollowingLombaert et al (2014) we used the results obtained fromthree genetic clustering methods along with historical andgeographical information to define genetic groups that couldinclude several sample sites (supplementary appendix S1Supplementary Material online) We defined seven main ge-netic groups from our set of 23 sample sites Asia Hawaiiwestern US eastern US Europe Brazil and La Reunion Tomake the computations less intensive with respect to com-puter resources ABC treatments were carried out using onerepresentative sample site for each genetic group per replicateanalysis In other words when a genetic group included morethan two sample sites we considered the two most differen-tiated sample sites (ie with the highest FST values in supplementary table S3 Supplementary Material online) as repre-sentative of the group CN-Shi and JP-Tok for the group Asiaall three sample sites for the genetically heterogeneousWestern US group US-NC and US-Wis for the group easternUS GE-Dos and IT-Tre for the group Europe Each of thesample sites retained (ie the representative sample sites) wasincluded alternatively in the sample set considered for ABC

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

991Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 10: Deciphering the Routes of invasion of Drosophila suzukii

The posterior distributions parameters relating to bottle-necks and admixture substantially differed from the priorsindicating that genetic data were informative for such param-eters (tables 3 and 4 and see supplementary tables S6 and S7Supplementary Material online for results using an alternativeset of representative sample sites) Regarding bottleneck se-verity we found that bottlenecks tended to be more severefor populations founded by individuals originating from asource located on a different continent (extra-continentalorigin) than for populations founded by individuals originat-ing from a source located on the same continent (intra-con-tinental origin table 3 and fig 1) The bottleneck severity forpopulations founded by individuals corresponding to a com-bination of the two types of sources (extra- and intra-continental origins) was either weak or moderate The stron-gest bottleneck severity was found for the first invasive

population in Hawaii which showed the lowest level of ge-netic variation among all invasive populations Bottleneckseverity values were globally slightly stronger in Europeanthan in US populations suggesting a smaller number offounding individuals andor a slower demographic recoveryin European populations

Regarding admixture (table 4) we found that (i) for thefirst recorded invasive population in western US (ieWatsonville sample site US-Wat) the genetic contributionfrom China was substantially larger than that from Hawaii(median value of 0759) (ii) for the western US populationUS-Sok the secondary genetic contribution from Hawaii wassmall compared to that from Watsonville (median value of0237) and (iii) the sample sites from northern Europe (egGE-Dos in Germany) contained a rather low proportion ofgenes originating from the eastern US (median value of

Table 3 Bottleneck Severity in Invasive Populations of D suzukii

Prior 1 Prior 2Introduction Type Sample Site Mean Median Mode q5 q95 Mean Median Mode q5 q95

Extra continental US-Haw 0517 0500 0495 0326 0755 0490 0484 0477 0382 0615US-Wat 0181 0138 0114 0041 0448 0143 0127 0118 0059 0275IT-Tre 0268 0179 0171 0059 0837 0236 0189 0172 0101 0553BR-PA 0158 0126 0104 0020 0407 0208 0193 0172 0067 0399FR-Reu 0259 0215 0186 0051 0626 0245 0214 0190 0069 0534

Intra-continental US-SD 0135 0117 0106 0039 0283 0117 0104 0091 0053 0224US-NC 0089 0067 0061 0015 0230 0111 0098 0090 0046 0218

Extra thorn intracontinental

US-Sok 0116 0099 0088 0027 0250 0119 0106 0099 0052 0229GE-Dos 0189 0177 0155 0061 0356 0205 0189 0171 0088 0372

Prior values 0628 0199 NA 0021 1904 0517 0500 NA 0326 0754

NOTEmdash Extra-continental introductions correspond to a long distance introduction from a source located apart from the continent of the focal population intra-continentalintroduction corresponds to an introduction event from a source located on the same continent than the focal population and Extrathorn Intra continental introductioncorresponds to a combination of the two types of sources Mean median and mode estimates as well as bounds of 90 credibility intervals (q5 and q95) are indicated foreach bottleneck severity parameter We roughly classified the estimated bottleneck severity values into three classes (represented here by the three shades of gray) weak (iemedian value of bottleneck severitylt 012 in light gray) moderate (012lt bottleneck severitylt 022 in gray) and strong (ie bottleneck severitygt 03 in dark gray) The set ofsample sites used for the ABC estimations presented here include (i) for the native area Japan (JP-Tokthorn JP-Sap) South-East China (CN-Nin) and North-East China (CN-LanthornCN-Lia) and (ii) for the invaded range US-Wat US-Sok and US-SD for western US US-NC for eastern US IT-Tre for southern Europe GE-Dos for northern Europe BR-PAfor South-America (Brazil) and FR-Reu for La Reunion island Code names of the sample sites are the same as in fig 1 and supplementary table S1 Supplementary Materialonline in which bottleneck severity classes are also given for each sample site See supplementary table S6 Supplementary Material online for results on a different set ofrepresentative sample sites

Table 4 Posterior Distributions of Admixture Rates for the Five Admixture Events Inferred for the Final Worldwide Invasion Scenario Described infigure 1

Admixture Event Admixture Rate (gene fraction from pop x) Mean Median Mode q5 q95

Prior 1 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0759 0761 0774 0659 0854A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0237 0230 0227 0092 0400A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0286 0278 0266 0112 0481A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0467 0461 0442 0187 0754A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0388 0380 0426 0132 0670

Prior 2 A1 (US-Wat frac14 US-Haw thorn CN-Nin) rUS-Wat (China ndash CN-Nin) 0761 0762 0766 0684 0833A2 (US-Sok frac14 US-Haw thorn US-Wat) rUS-Sok (USA ndash US-Wat) 0224 0219 0199 0114 0350A3 (DE-Dos frac14 IT-Tre thorn US-NC) rDE-Dos (USA ndash US-NC) 0312 0306 0284 0152 0487A4 (BR-PA frac14 US-NC thorn US-SD) rBR-PA (USA ndash US-SD) 0422 0418 0429 0247 0613A5 (FR-Reu frac14 IT-Tre thorn GE-Dos) rFR-Reu (Europe ndash GE-Dos) 0370 0368 0356 0178 0569

NOTEmdash Admixture events are denoted as in figure 1 Each admixture rate parameter r points to the name of the admixed site and corresponds to the fraction of genesoriginating from the source site in parentheses (1 r genes originate from the other source site) Mean median and mode estimates as well as bounds of 90 credibilityintervals (q5 and q95) are indicated for each admixture parameter Estimations assuming the prior set 1 and the prior set 2 (supplementary table S8 Supplementary Materialonline) are provided The set of representative sample sites used for the ABC estimations presented here is the same than in table 3 Code names of the population sites are thesame as in fig 1 and supplementary table S1 Supplementary Material online See supplementary table S7 Supplementary Material online for results on a different set ofrepresentative sample sites

Fraimout et al doi101093molbevmsx050 MBE

988Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

0286) We found more balanced admixture rates for thepopulations from Brazil (BR-PA) and La Reunion (FR-Reu)but information on this parameter was less accurate in thesecases as indicated by large 90 credibility intervals

Model-Posterior CheckingLike any model-based methods ABC inferences do not revealthe ldquotruerdquo evolutionary history but allows choosing the bestamong a necessarily limited set of scenarios that have beencompared and to estimate posterior distributions of param-eters under this scenario How well the inferred scenario-posterior combination matches with the observed datasetremains to be evaluated using an ABC model-posterior check-ing analysis When applying such an analysis on the final in-vasion scenario detailed in figure 1 we found that whenconsidering the prior set 1 only 54 of the 1141 summarystatistics used as test quantities had low posterior predictiveP-values (ie 0002lt ppp-valueslt 5) Moreover none ofthose ppp-values values remained significant when correctingfor multiple comparisons (Benjamini and Hochberg 1995)Similar results were obtained when assuming the prior set 2and when considering different sets of representative samplesites in our analyses (results not shown) These findings showthat the final worldwide invasion scenario (fig 1 with associ-ated parameter posterior distributions) matches the observeddataset well In agreement with this the projections of thesimulated datasets on the principal component axes from thefinal model-posterior combination were well grouped andcentered on the target point corresponding to the observeddataset (supplementary fig S4 Supplementary Material on-line) Due to the modest size of the dataset (ie 25 microsat-ellite loci) however only situations of major inadequacy ofthe model-posterior combination to the observed dataset arelikely to be identified

DiscussionIn the present paper we decipher the routes taken byD suzukii in its invasion worldwide evaluating evidence ofbottlenecks and genetic admixture associated with differentintroductions and we quantify the efficiency of ABC-RF rel-ative to the more standard ABC-LDA method for choosingamong introduction scenarios

A Complex Worldwide Invasion HistoryPrior to this study and that of Adrion et al (2014) our un-derstanding of the worldwide introduction pathways ofD suzukii was based on historical and observational datawhich were incomplete and potentially misleading Our re-sults indicate three distinct introductions from the nativerangemdashto Hawaii the western side of North America andwestern Europe which accords well to findings of Adrionet al (2014) from a different set of markers (X-linked se-quence data) Both the western North American introduc-tions and at least some European populations show signs ofadmixture Hawaii and China both appear to have contrib-uted to the western North American introductions whichwas in turn the most probable source for the introductioninto eastern North America Similarly China and to a lesser

extent eastern North America both appear to have contrib-uted to the European introduction but mostly in northernEuropean areas with respect to the eastern North Americancontribution The almost simultaneous invasion of NorthAmerica and Europe from Asia could be due to increasedtrade between these areas facilitating transport of this speciesbetween regions Alternatively or in addition adaptation tohuman-altered habitats (in this case changes in agriculturalpractices) within the native range of the species could havepromoted invasion to similarly altered habitats worldwide(ie anthropogenicaly induced adaptation to invadeHufbauer et al 2012) However the two invaded continentswere most likely colonized by flies from distinct Chinese geo-graphic areas (fig 1) requiring that adaptation to agricultureoccurred concomitantly at several locations within the nativerange This is possible but not evolutionarily parsimoniousand further data would be required to test this explanationSubsequent introductions from the three primary founderlocations to other locations result in a complex history ofinvasion Both primary and secondary introductions showsigns of demographic bottlenecks and several invasive pop-ulations have multiples sources and thus experienced geneticadmixture between differentiated native or invasivepopulations

The genetic relationships between worldwide D suzukiipopulations define the genetic variation they harbor andmay continue to shape further spread of alleles among pop-ulations Gene flow among populations through continuousdispersal or more punctual admixture events permits thedissemination of allelic variants and new mutations withdramatic consequences for evolutionary trajectories(Lenormand 2002) We found evidence for asymmetric ge-netic admixture from North American towards (northern)European populations If this signal of admixture reflects re-current and on-going dispersal events rather than a singlepast secondary introduction event then variants that arisein North America would be more likely to spread to Europethan the reverse Discriminating between ongoing and pastdispersal events is tricky however especially for recent inva-sions (Benazzo et al 2015) If the occurrence of recurrent geneflow is confirmed then D suzukii populations from Europemay not lag behind American ones in evolutionary termsFrom an applied perspective if resistance to control practicesemerges in North America special efforts to stop the impor-tation of D suzukii to Europe would help prevent the evolu-tion of resistance in Europe and thus reduce damage to crops(Caprio and Tabashnik 1992)

Advances in Methods for Discriminating amongComplex ModelsTo the best of our knowledge the present study is the first touse the recently developed ABC-RF method (Pudlo et al2016) to test competing models characterized by differentlevels of complexity To compare ABC-RF to a more standardABC method we carried out a subset of model choice anal-yses using both ABC-RF and ABC-LDA (Estoup et al 2012)We ran ABC-LDA analyses in two ways with standard sizedlarge reference tables (500000 simulated datasets per

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

989Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

scenario) and with small reference tables (10000 simulateddatasets per scenario the same size as used for ABC-RF) Theperformance of ABC-LDA (in terms of prior error rate) inchoosing among introduction scenarios was slightly betterthan for ABC-RF when using large reference tablesFurthermore for ABC-LDA with large reference tables confi-dence in model choice (in terms of posterior probability) wasrelatively comparable to ABC-RF with small reference tablesThus when analyses are relatively simple or computationalresources are not limiting ABC-LDA can provide as robustinferences of invasion scenarios as ABC-RF

In contrast we found that ABC-RF out-performed ABC-LDA when using similarly small reference tables for both typesof analyses (10000 simulated datasets per scenario) ABC-LDA also was unstable choosing different scenarios in differ-ent replicate analyses when using small reference tablesThese empirical findings correspond well with those ofPudlo et al (2016) using simple models for which the trueposterior probabilities could be calculated Pudlo et al (2016)demonstrated that ABC-RF provided more reliable posteriorprobability of the best (true) scenario than standard ABCmethods when using the same (small) number of simulateddatasets in the reference table (see supplementary fig S3 inPudlo et al 2016) In our work the difference in performancebetween the two methods was most pronounced for com-plex analyses (eg when comparing more than 10 scenariosusing more than 100 summary statistics) Thus when analysesare not simple or computational resources are finite ABC-RFprovides more robust inferences of invasion scenarios thanABC-LDA

The lower computation cost of ABC-RF was also evidentFor a given observed dataset selecting the best supportedscenario and computing its posterior probability was ca 50times faster than when processing an ABC-LDA treatmentwith a traditional reference table of large size Moreover es-timation of prior error rates with ABC-RF took only a fewadditional minutes of computation time whereas it lastedseveral hours to several days with ABC-LDA The conse-quence of low computational costs is not simply in computertime Rather the efficiency of ABC-RF analyses allowed us torun replicate analyses on various sample sets even for themost complex D suzukii invasion scenarios This replication iscritical in evaluating the robustness of our statisticalinferences

We found that at least for some ABC-RF analyses theposterior probability of the best scenario and the global sta-tistical power to choose among alternative invasion scenarios(as measured by the prior error rate) were relatively low Forinstance in three of the 11 analyses (analyses 3b 5a and 5b)the best scenarios had relatively low probabilities (ie rangingfrom 0500 to 0630) and relatively high prior error rates (ieranging from 0300 to 0400) Although replicate analyses car-ried out on different sample sets and prior sets pointed to thesame best scenario for these three analyses such low proba-bility values and high prior error rates suggest a moderatelevel of confidence in the choice of model in these casesSimilarly we found that the scenario choice leading to theconclusion of the presence versus absence of admixed genes

from the eastern US in various European sample sites reliedon probability values that were sometimes as low as 0500Moreover in a minority of cases the best scenario was differ-ent depending on the sample site considered as representa-tive of the eastern US or the native area The observed patternof an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north tosouth gradient should hence be taken cautiously The anal-yses of additional European sample sites are needed to clarifythis issue The above inferential uncertainties also most likelyfind their sources in the relatively small number of markersgenotyped relative to the number complexity and similarityof some of the competing models Such results indicate thatthere is room for improving the robustness of our inferencesat least for a subset of our analyses

Bottlenecks and Genetic AdmixtureWe found strong evidence that bottlenecks and admixtureboth occurred during the course of the invasion of D suzukiiA first sign of bottlenecks is a reduction in neutral geneticdiversity We found that all invasive D suzukii populationswere characterized by significantly lower genetic variationthan native ones with a loss of 64 (US-Col) to 233(US-Haw) of heterozygosity and of 273 (US-Wat) to542 (US-Haw) of allelic diversity respectively (supplementary fig S5 Supplementary Material online) Adrion et al(2014) find a similar significant loss of diversity in their datasetfocused on gene sequences Dlugosch and Parker (2008) andUller and Leimu (2011) reviewed studies of neutral geneticdiversity in a large number of species of animals plants andfungi and compared nuclear molecular diversity within intro-duced and source populations Overall they found that a lossof variation was the most frequent feature in invasive popu-lations However reductions in genetic variation were on av-erage modest (eg average loss of 187 and 155 ofheterozygosity and allelic diversity respectively Dlugoschand Parker 2008) The reductions observed in our study ininvasive D suzukii populations were thus in the same rangefor heterozygosity but higher for allelic diversity (cf averageloss of 111 and 345 of heterozygosity and allelic diversityrespectively) A larger loss of allelic diversity than heterozy-gosity after a bottleneck is nevertheless expected by theory(Nei et al 1975)

The type of introduction pathway explains at least partlythe severity of the reduction in genetic diversity Bottleneckstended to be less severe for populations founded by individ-uals from the same continent than for populations foundedby individuals originating from a different continent A simpleexplanation for such pattern is that due to greater proximitypopulations originating from the same continent are likely tobe initially founded by a larger numbers of individuals andalso to experience recurrent gene flow Populations foundedby individuals from both the same and different continentsexperienced weak or moderate reductions in diversity sug-gesting that multiple introduction pathways restore diversity

In agreement with previous studies on invasions on simi-larly large geographical scales (eg Lombaert et al 2014 andreference therein) we found that admixture events were

Fraimout et al doi101093molbevmsx050 MBE

990Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

frequent in the worldwide invasion history of D suzukii Weidentified at least five admixture events for which sourceswere native andor invasive populations (ie A1ndashA5 eventsin fig 1) The genetic contributions of the sources were un-balanced except for the populations from Brazil and LaReunion but genetic information regarding admixture wasconsiderably less accurate in the latter cases which involvedweakly differentiated source populations

We have provided here the information necessary to eval-uate whether bottlenecks and admixture have influencedoutcomes in this invasion Quantitative genetics studies inthe lab focusing on the key source bottlenecked and ad-mixed populations could now examine the fitness and adap-tive potential of those groups (Lavergne and Molofsky 2007Facon et al 2008 Turgeon et al 2011)

Conclusions and PerspectivesUnderstanding invasion pathways provides key informationfor understanding the role of evolutionary processes in bio-logical invasions For example only does our research revealsources of invasive populations for further relevant compar-isons using quantitative genetics studies in the labAdditionally we found that the invasion of Europe by Dsuzukii was distinct from that of North America with limitedand asymmetrical gene flow between these two main invadedareas This situation provides the opportunity to evaluateevolutionary trajectories in replicate into two separate tem-perate climate regions similarly to research by Gilchrist et al(2001 2004) on the pace of clinal evolution in the invasivefruit fly Drosophila subobscura There are also distinct invasionpathways for areas with warmer climates (ie Hawaii LaReunion and Brazil) that will make interesting comparisons

Retracing invasion routes and making inferences aboutdemographic processes is only possible if there is adequatepolymorphism within populations and significant genetic dif-ferentiation among them As is evident here microsatellitedata can provide enough variation to reveal important path-ways We concur with Adrion et al (2014) however thatgenome-wide data (ie next generation sequencingapproaches) will be tremendously powerful in further dis-criminating among complex invasion scenarios (see alsoCristescu 2015 Estoup et al 2016) Because of the reducedcomputational resources demanded by ABC-RF this methodwill be particularly useful for analysis of massive single nucle-otide polymorphism datasets (Pudlo et al 2016) In additionto the selectively neutral demographic inference leading tothe reconstruction of routes of invasion population (andquantitative) next generation sequencing approaches arequite promising for studying the evolution of phenotypictraits in natural populations (Wray 2013 Gautier 2015)Specifically they can be used to better understand the geneticarchitecture of traits underlying invasion success (Bock et al2015) Drosophila suzukii is a good species for such researchgiven (i) its short generation time and viability in laboratoryconditions which facilitate experimental approaches to studyquantitative traits of interest (Asplen et al 2015) and (ii) theavailability of annotated genome assemblies for this species

(Chiu et al 2013 Ometto et al 2013) along with the hugeamount of genomic resources available in its close relativespecies D melanogaster (Groen and Whiteman 2016)

Materials and Methods

Sampling and GenotypingAdult D suzukii were sampled from the field at a total of 23localities (hereafter termed sample sites) distributed through-out most of the native and invasive range of the species(supplementary table S1 Supplementary Material onlineand fig 1) Samples were collected between 2013 and 2015using baited traps and sweep nets and stored in ethanolNative Asian samples consisted of a total of six sample sitesincluding four Chinese and two Japanese localities Samplesfrom the invasive range were collected in Hawaii (1 samplesite) Continental US (7 sites) Europe (7 sites) Brazil (1 site)and in the French island of La Reunion (1 site) For eachsample site 15ndash44 adult flies were genotyped at 25 of the28 microsatellite loci described in Fraimout et al (2015) re-sulting in a total of 685 genotyped individuals Three micro-satellite loci from Fraimout et al (2015) (ie loci DS31 DS42and DS45) were not included due to the presence of nullalleles at frequenciesgt 10 in some sample sites (resultsnot shown) DNA extraction and PCR amplification as wellas allele scoring were performed following the methodologydetailed in Fraimout et al (2015)

General Framework for Defining Focal PopulationGroups and Sets of Scenarios to Be ComparedUsing ABCOne of the many challenges of ABC analyses is to optimallydefine sets of sample sites to be processed chronologically aspotential sources and targets in a way that do not hamper thecomputational effort As stated in Estoup and Guillemaud(2010) and Lombaert et al (2014) the number and complex-ity of competing scenarios is proportional to the number ofgenetic groups to be accounted in ABC analyses FollowingLombaert et al (2014) we used the results obtained fromthree genetic clustering methods along with historical andgeographical information to define genetic groups that couldinclude several sample sites (supplementary appendix S1Supplementary Material online) We defined seven main ge-netic groups from our set of 23 sample sites Asia Hawaiiwestern US eastern US Europe Brazil and La Reunion Tomake the computations less intensive with respect to com-puter resources ABC treatments were carried out using onerepresentative sample site for each genetic group per replicateanalysis In other words when a genetic group included morethan two sample sites we considered the two most differen-tiated sample sites (ie with the highest FST values in supplementary table S3 Supplementary Material online) as repre-sentative of the group CN-Shi and JP-Tok for the group Asiaall three sample sites for the genetically heterogeneousWestern US group US-NC and US-Wis for the group easternUS GE-Dos and IT-Tre for the group Europe Each of thesample sites retained (ie the representative sample sites) wasincluded alternatively in the sample set considered for ABC

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

991Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 11: Deciphering the Routes of invasion of Drosophila suzukii

0286) We found more balanced admixture rates for thepopulations from Brazil (BR-PA) and La Reunion (FR-Reu)but information on this parameter was less accurate in thesecases as indicated by large 90 credibility intervals

Model-Posterior CheckingLike any model-based methods ABC inferences do not revealthe ldquotruerdquo evolutionary history but allows choosing the bestamong a necessarily limited set of scenarios that have beencompared and to estimate posterior distributions of param-eters under this scenario How well the inferred scenario-posterior combination matches with the observed datasetremains to be evaluated using an ABC model-posterior check-ing analysis When applying such an analysis on the final in-vasion scenario detailed in figure 1 we found that whenconsidering the prior set 1 only 54 of the 1141 summarystatistics used as test quantities had low posterior predictiveP-values (ie 0002lt ppp-valueslt 5) Moreover none ofthose ppp-values values remained significant when correctingfor multiple comparisons (Benjamini and Hochberg 1995)Similar results were obtained when assuming the prior set 2and when considering different sets of representative samplesites in our analyses (results not shown) These findings showthat the final worldwide invasion scenario (fig 1 with associ-ated parameter posterior distributions) matches the observeddataset well In agreement with this the projections of thesimulated datasets on the principal component axes from thefinal model-posterior combination were well grouped andcentered on the target point corresponding to the observeddataset (supplementary fig S4 Supplementary Material on-line) Due to the modest size of the dataset (ie 25 microsat-ellite loci) however only situations of major inadequacy ofthe model-posterior combination to the observed dataset arelikely to be identified

DiscussionIn the present paper we decipher the routes taken byD suzukii in its invasion worldwide evaluating evidence ofbottlenecks and genetic admixture associated with differentintroductions and we quantify the efficiency of ABC-RF rel-ative to the more standard ABC-LDA method for choosingamong introduction scenarios

A Complex Worldwide Invasion HistoryPrior to this study and that of Adrion et al (2014) our un-derstanding of the worldwide introduction pathways ofD suzukii was based on historical and observational datawhich were incomplete and potentially misleading Our re-sults indicate three distinct introductions from the nativerangemdashto Hawaii the western side of North America andwestern Europe which accords well to findings of Adrionet al (2014) from a different set of markers (X-linked se-quence data) Both the western North American introduc-tions and at least some European populations show signs ofadmixture Hawaii and China both appear to have contrib-uted to the western North American introductions whichwas in turn the most probable source for the introductioninto eastern North America Similarly China and to a lesser

extent eastern North America both appear to have contrib-uted to the European introduction but mostly in northernEuropean areas with respect to the eastern North Americancontribution The almost simultaneous invasion of NorthAmerica and Europe from Asia could be due to increasedtrade between these areas facilitating transport of this speciesbetween regions Alternatively or in addition adaptation tohuman-altered habitats (in this case changes in agriculturalpractices) within the native range of the species could havepromoted invasion to similarly altered habitats worldwide(ie anthropogenicaly induced adaptation to invadeHufbauer et al 2012) However the two invaded continentswere most likely colonized by flies from distinct Chinese geo-graphic areas (fig 1) requiring that adaptation to agricultureoccurred concomitantly at several locations within the nativerange This is possible but not evolutionarily parsimoniousand further data would be required to test this explanationSubsequent introductions from the three primary founderlocations to other locations result in a complex history ofinvasion Both primary and secondary introductions showsigns of demographic bottlenecks and several invasive pop-ulations have multiples sources and thus experienced geneticadmixture between differentiated native or invasivepopulations

The genetic relationships between worldwide D suzukiipopulations define the genetic variation they harbor andmay continue to shape further spread of alleles among pop-ulations Gene flow among populations through continuousdispersal or more punctual admixture events permits thedissemination of allelic variants and new mutations withdramatic consequences for evolutionary trajectories(Lenormand 2002) We found evidence for asymmetric ge-netic admixture from North American towards (northern)European populations If this signal of admixture reflects re-current and on-going dispersal events rather than a singlepast secondary introduction event then variants that arisein North America would be more likely to spread to Europethan the reverse Discriminating between ongoing and pastdispersal events is tricky however especially for recent inva-sions (Benazzo et al 2015) If the occurrence of recurrent geneflow is confirmed then D suzukii populations from Europemay not lag behind American ones in evolutionary termsFrom an applied perspective if resistance to control practicesemerges in North America special efforts to stop the impor-tation of D suzukii to Europe would help prevent the evolu-tion of resistance in Europe and thus reduce damage to crops(Caprio and Tabashnik 1992)

Advances in Methods for Discriminating amongComplex ModelsTo the best of our knowledge the present study is the first touse the recently developed ABC-RF method (Pudlo et al2016) to test competing models characterized by differentlevels of complexity To compare ABC-RF to a more standardABC method we carried out a subset of model choice anal-yses using both ABC-RF and ABC-LDA (Estoup et al 2012)We ran ABC-LDA analyses in two ways with standard sizedlarge reference tables (500000 simulated datasets per

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

989Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

scenario) and with small reference tables (10000 simulateddatasets per scenario the same size as used for ABC-RF) Theperformance of ABC-LDA (in terms of prior error rate) inchoosing among introduction scenarios was slightly betterthan for ABC-RF when using large reference tablesFurthermore for ABC-LDA with large reference tables confi-dence in model choice (in terms of posterior probability) wasrelatively comparable to ABC-RF with small reference tablesThus when analyses are relatively simple or computationalresources are not limiting ABC-LDA can provide as robustinferences of invasion scenarios as ABC-RF

In contrast we found that ABC-RF out-performed ABC-LDA when using similarly small reference tables for both typesof analyses (10000 simulated datasets per scenario) ABC-LDA also was unstable choosing different scenarios in differ-ent replicate analyses when using small reference tablesThese empirical findings correspond well with those ofPudlo et al (2016) using simple models for which the trueposterior probabilities could be calculated Pudlo et al (2016)demonstrated that ABC-RF provided more reliable posteriorprobability of the best (true) scenario than standard ABCmethods when using the same (small) number of simulateddatasets in the reference table (see supplementary fig S3 inPudlo et al 2016) In our work the difference in performancebetween the two methods was most pronounced for com-plex analyses (eg when comparing more than 10 scenariosusing more than 100 summary statistics) Thus when analysesare not simple or computational resources are finite ABC-RFprovides more robust inferences of invasion scenarios thanABC-LDA

The lower computation cost of ABC-RF was also evidentFor a given observed dataset selecting the best supportedscenario and computing its posterior probability was ca 50times faster than when processing an ABC-LDA treatmentwith a traditional reference table of large size Moreover es-timation of prior error rates with ABC-RF took only a fewadditional minutes of computation time whereas it lastedseveral hours to several days with ABC-LDA The conse-quence of low computational costs is not simply in computertime Rather the efficiency of ABC-RF analyses allowed us torun replicate analyses on various sample sets even for themost complex D suzukii invasion scenarios This replication iscritical in evaluating the robustness of our statisticalinferences

We found that at least for some ABC-RF analyses theposterior probability of the best scenario and the global sta-tistical power to choose among alternative invasion scenarios(as measured by the prior error rate) were relatively low Forinstance in three of the 11 analyses (analyses 3b 5a and 5b)the best scenarios had relatively low probabilities (ie rangingfrom 0500 to 0630) and relatively high prior error rates (ieranging from 0300 to 0400) Although replicate analyses car-ried out on different sample sets and prior sets pointed to thesame best scenario for these three analyses such low proba-bility values and high prior error rates suggest a moderatelevel of confidence in the choice of model in these casesSimilarly we found that the scenario choice leading to theconclusion of the presence versus absence of admixed genes

from the eastern US in various European sample sites reliedon probability values that were sometimes as low as 0500Moreover in a minority of cases the best scenario was differ-ent depending on the sample site considered as representa-tive of the eastern US or the native area The observed patternof an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north tosouth gradient should hence be taken cautiously The anal-yses of additional European sample sites are needed to clarifythis issue The above inferential uncertainties also most likelyfind their sources in the relatively small number of markersgenotyped relative to the number complexity and similarityof some of the competing models Such results indicate thatthere is room for improving the robustness of our inferencesat least for a subset of our analyses

Bottlenecks and Genetic AdmixtureWe found strong evidence that bottlenecks and admixtureboth occurred during the course of the invasion of D suzukiiA first sign of bottlenecks is a reduction in neutral geneticdiversity We found that all invasive D suzukii populationswere characterized by significantly lower genetic variationthan native ones with a loss of 64 (US-Col) to 233(US-Haw) of heterozygosity and of 273 (US-Wat) to542 (US-Haw) of allelic diversity respectively (supplementary fig S5 Supplementary Material online) Adrion et al(2014) find a similar significant loss of diversity in their datasetfocused on gene sequences Dlugosch and Parker (2008) andUller and Leimu (2011) reviewed studies of neutral geneticdiversity in a large number of species of animals plants andfungi and compared nuclear molecular diversity within intro-duced and source populations Overall they found that a lossof variation was the most frequent feature in invasive popu-lations However reductions in genetic variation were on av-erage modest (eg average loss of 187 and 155 ofheterozygosity and allelic diversity respectively Dlugoschand Parker 2008) The reductions observed in our study ininvasive D suzukii populations were thus in the same rangefor heterozygosity but higher for allelic diversity (cf averageloss of 111 and 345 of heterozygosity and allelic diversityrespectively) A larger loss of allelic diversity than heterozy-gosity after a bottleneck is nevertheless expected by theory(Nei et al 1975)

The type of introduction pathway explains at least partlythe severity of the reduction in genetic diversity Bottleneckstended to be less severe for populations founded by individ-uals from the same continent than for populations foundedby individuals originating from a different continent A simpleexplanation for such pattern is that due to greater proximitypopulations originating from the same continent are likely tobe initially founded by a larger numbers of individuals andalso to experience recurrent gene flow Populations foundedby individuals from both the same and different continentsexperienced weak or moderate reductions in diversity sug-gesting that multiple introduction pathways restore diversity

In agreement with previous studies on invasions on simi-larly large geographical scales (eg Lombaert et al 2014 andreference therein) we found that admixture events were

Fraimout et al doi101093molbevmsx050 MBE

990Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

frequent in the worldwide invasion history of D suzukii Weidentified at least five admixture events for which sourceswere native andor invasive populations (ie A1ndashA5 eventsin fig 1) The genetic contributions of the sources were un-balanced except for the populations from Brazil and LaReunion but genetic information regarding admixture wasconsiderably less accurate in the latter cases which involvedweakly differentiated source populations

We have provided here the information necessary to eval-uate whether bottlenecks and admixture have influencedoutcomes in this invasion Quantitative genetics studies inthe lab focusing on the key source bottlenecked and ad-mixed populations could now examine the fitness and adap-tive potential of those groups (Lavergne and Molofsky 2007Facon et al 2008 Turgeon et al 2011)

Conclusions and PerspectivesUnderstanding invasion pathways provides key informationfor understanding the role of evolutionary processes in bio-logical invasions For example only does our research revealsources of invasive populations for further relevant compar-isons using quantitative genetics studies in the labAdditionally we found that the invasion of Europe by Dsuzukii was distinct from that of North America with limitedand asymmetrical gene flow between these two main invadedareas This situation provides the opportunity to evaluateevolutionary trajectories in replicate into two separate tem-perate climate regions similarly to research by Gilchrist et al(2001 2004) on the pace of clinal evolution in the invasivefruit fly Drosophila subobscura There are also distinct invasionpathways for areas with warmer climates (ie Hawaii LaReunion and Brazil) that will make interesting comparisons

Retracing invasion routes and making inferences aboutdemographic processes is only possible if there is adequatepolymorphism within populations and significant genetic dif-ferentiation among them As is evident here microsatellitedata can provide enough variation to reveal important path-ways We concur with Adrion et al (2014) however thatgenome-wide data (ie next generation sequencingapproaches) will be tremendously powerful in further dis-criminating among complex invasion scenarios (see alsoCristescu 2015 Estoup et al 2016) Because of the reducedcomputational resources demanded by ABC-RF this methodwill be particularly useful for analysis of massive single nucle-otide polymorphism datasets (Pudlo et al 2016) In additionto the selectively neutral demographic inference leading tothe reconstruction of routes of invasion population (andquantitative) next generation sequencing approaches arequite promising for studying the evolution of phenotypictraits in natural populations (Wray 2013 Gautier 2015)Specifically they can be used to better understand the geneticarchitecture of traits underlying invasion success (Bock et al2015) Drosophila suzukii is a good species for such researchgiven (i) its short generation time and viability in laboratoryconditions which facilitate experimental approaches to studyquantitative traits of interest (Asplen et al 2015) and (ii) theavailability of annotated genome assemblies for this species

(Chiu et al 2013 Ometto et al 2013) along with the hugeamount of genomic resources available in its close relativespecies D melanogaster (Groen and Whiteman 2016)

Materials and Methods

Sampling and GenotypingAdult D suzukii were sampled from the field at a total of 23localities (hereafter termed sample sites) distributed through-out most of the native and invasive range of the species(supplementary table S1 Supplementary Material onlineand fig 1) Samples were collected between 2013 and 2015using baited traps and sweep nets and stored in ethanolNative Asian samples consisted of a total of six sample sitesincluding four Chinese and two Japanese localities Samplesfrom the invasive range were collected in Hawaii (1 samplesite) Continental US (7 sites) Europe (7 sites) Brazil (1 site)and in the French island of La Reunion (1 site) For eachsample site 15ndash44 adult flies were genotyped at 25 of the28 microsatellite loci described in Fraimout et al (2015) re-sulting in a total of 685 genotyped individuals Three micro-satellite loci from Fraimout et al (2015) (ie loci DS31 DS42and DS45) were not included due to the presence of nullalleles at frequenciesgt 10 in some sample sites (resultsnot shown) DNA extraction and PCR amplification as wellas allele scoring were performed following the methodologydetailed in Fraimout et al (2015)

General Framework for Defining Focal PopulationGroups and Sets of Scenarios to Be ComparedUsing ABCOne of the many challenges of ABC analyses is to optimallydefine sets of sample sites to be processed chronologically aspotential sources and targets in a way that do not hamper thecomputational effort As stated in Estoup and Guillemaud(2010) and Lombaert et al (2014) the number and complex-ity of competing scenarios is proportional to the number ofgenetic groups to be accounted in ABC analyses FollowingLombaert et al (2014) we used the results obtained fromthree genetic clustering methods along with historical andgeographical information to define genetic groups that couldinclude several sample sites (supplementary appendix S1Supplementary Material online) We defined seven main ge-netic groups from our set of 23 sample sites Asia Hawaiiwestern US eastern US Europe Brazil and La Reunion Tomake the computations less intensive with respect to com-puter resources ABC treatments were carried out using onerepresentative sample site for each genetic group per replicateanalysis In other words when a genetic group included morethan two sample sites we considered the two most differen-tiated sample sites (ie with the highest FST values in supplementary table S3 Supplementary Material online) as repre-sentative of the group CN-Shi and JP-Tok for the group Asiaall three sample sites for the genetically heterogeneousWestern US group US-NC and US-Wis for the group easternUS GE-Dos and IT-Tre for the group Europe Each of thesample sites retained (ie the representative sample sites) wasincluded alternatively in the sample set considered for ABC

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

991Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 12: Deciphering the Routes of invasion of Drosophila suzukii

scenario) and with small reference tables (10000 simulateddatasets per scenario the same size as used for ABC-RF) Theperformance of ABC-LDA (in terms of prior error rate) inchoosing among introduction scenarios was slightly betterthan for ABC-RF when using large reference tablesFurthermore for ABC-LDA with large reference tables confi-dence in model choice (in terms of posterior probability) wasrelatively comparable to ABC-RF with small reference tablesThus when analyses are relatively simple or computationalresources are not limiting ABC-LDA can provide as robustinferences of invasion scenarios as ABC-RF

In contrast we found that ABC-RF out-performed ABC-LDA when using similarly small reference tables for both typesof analyses (10000 simulated datasets per scenario) ABC-LDA also was unstable choosing different scenarios in differ-ent replicate analyses when using small reference tablesThese empirical findings correspond well with those ofPudlo et al (2016) using simple models for which the trueposterior probabilities could be calculated Pudlo et al (2016)demonstrated that ABC-RF provided more reliable posteriorprobability of the best (true) scenario than standard ABCmethods when using the same (small) number of simulateddatasets in the reference table (see supplementary fig S3 inPudlo et al 2016) In our work the difference in performancebetween the two methods was most pronounced for com-plex analyses (eg when comparing more than 10 scenariosusing more than 100 summary statistics) Thus when analysesare not simple or computational resources are finite ABC-RFprovides more robust inferences of invasion scenarios thanABC-LDA

The lower computation cost of ABC-RF was also evidentFor a given observed dataset selecting the best supportedscenario and computing its posterior probability was ca 50times faster than when processing an ABC-LDA treatmentwith a traditional reference table of large size Moreover es-timation of prior error rates with ABC-RF took only a fewadditional minutes of computation time whereas it lastedseveral hours to several days with ABC-LDA The conse-quence of low computational costs is not simply in computertime Rather the efficiency of ABC-RF analyses allowed us torun replicate analyses on various sample sets even for themost complex D suzukii invasion scenarios This replication iscritical in evaluating the robustness of our statisticalinferences

We found that at least for some ABC-RF analyses theposterior probability of the best scenario and the global sta-tistical power to choose among alternative invasion scenarios(as measured by the prior error rate) were relatively low Forinstance in three of the 11 analyses (analyses 3b 5a and 5b)the best scenarios had relatively low probabilities (ie rangingfrom 0500 to 0630) and relatively high prior error rates (ieranging from 0300 to 0400) Although replicate analyses car-ried out on different sample sets and prior sets pointed to thesame best scenario for these three analyses such low proba-bility values and high prior error rates suggest a moderatelevel of confidence in the choice of model in these casesSimilarly we found that the scenario choice leading to theconclusion of the presence versus absence of admixed genes

from the eastern US in various European sample sites reliedon probability values that were sometimes as low as 0500Moreover in a minority of cases the best scenario was differ-ent depending on the sample site considered as representa-tive of the eastern US or the native area The observed patternof an uneven spatial distribution of the presence of genes ofeastern US origin in Europe following roughly a north tosouth gradient should hence be taken cautiously The anal-yses of additional European sample sites are needed to clarifythis issue The above inferential uncertainties also most likelyfind their sources in the relatively small number of markersgenotyped relative to the number complexity and similarityof some of the competing models Such results indicate thatthere is room for improving the robustness of our inferencesat least for a subset of our analyses

Bottlenecks and Genetic AdmixtureWe found strong evidence that bottlenecks and admixtureboth occurred during the course of the invasion of D suzukiiA first sign of bottlenecks is a reduction in neutral geneticdiversity We found that all invasive D suzukii populationswere characterized by significantly lower genetic variationthan native ones with a loss of 64 (US-Col) to 233(US-Haw) of heterozygosity and of 273 (US-Wat) to542 (US-Haw) of allelic diversity respectively (supplementary fig S5 Supplementary Material online) Adrion et al(2014) find a similar significant loss of diversity in their datasetfocused on gene sequences Dlugosch and Parker (2008) andUller and Leimu (2011) reviewed studies of neutral geneticdiversity in a large number of species of animals plants andfungi and compared nuclear molecular diversity within intro-duced and source populations Overall they found that a lossof variation was the most frequent feature in invasive popu-lations However reductions in genetic variation were on av-erage modest (eg average loss of 187 and 155 ofheterozygosity and allelic diversity respectively Dlugoschand Parker 2008) The reductions observed in our study ininvasive D suzukii populations were thus in the same rangefor heterozygosity but higher for allelic diversity (cf averageloss of 111 and 345 of heterozygosity and allelic diversityrespectively) A larger loss of allelic diversity than heterozy-gosity after a bottleneck is nevertheless expected by theory(Nei et al 1975)

The type of introduction pathway explains at least partlythe severity of the reduction in genetic diversity Bottleneckstended to be less severe for populations founded by individ-uals from the same continent than for populations foundedby individuals originating from a different continent A simpleexplanation for such pattern is that due to greater proximitypopulations originating from the same continent are likely tobe initially founded by a larger numbers of individuals andalso to experience recurrent gene flow Populations foundedby individuals from both the same and different continentsexperienced weak or moderate reductions in diversity sug-gesting that multiple introduction pathways restore diversity

In agreement with previous studies on invasions on simi-larly large geographical scales (eg Lombaert et al 2014 andreference therein) we found that admixture events were

Fraimout et al doi101093molbevmsx050 MBE

990Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

frequent in the worldwide invasion history of D suzukii Weidentified at least five admixture events for which sourceswere native andor invasive populations (ie A1ndashA5 eventsin fig 1) The genetic contributions of the sources were un-balanced except for the populations from Brazil and LaReunion but genetic information regarding admixture wasconsiderably less accurate in the latter cases which involvedweakly differentiated source populations

We have provided here the information necessary to eval-uate whether bottlenecks and admixture have influencedoutcomes in this invasion Quantitative genetics studies inthe lab focusing on the key source bottlenecked and ad-mixed populations could now examine the fitness and adap-tive potential of those groups (Lavergne and Molofsky 2007Facon et al 2008 Turgeon et al 2011)

Conclusions and PerspectivesUnderstanding invasion pathways provides key informationfor understanding the role of evolutionary processes in bio-logical invasions For example only does our research revealsources of invasive populations for further relevant compar-isons using quantitative genetics studies in the labAdditionally we found that the invasion of Europe by Dsuzukii was distinct from that of North America with limitedand asymmetrical gene flow between these two main invadedareas This situation provides the opportunity to evaluateevolutionary trajectories in replicate into two separate tem-perate climate regions similarly to research by Gilchrist et al(2001 2004) on the pace of clinal evolution in the invasivefruit fly Drosophila subobscura There are also distinct invasionpathways for areas with warmer climates (ie Hawaii LaReunion and Brazil) that will make interesting comparisons

Retracing invasion routes and making inferences aboutdemographic processes is only possible if there is adequatepolymorphism within populations and significant genetic dif-ferentiation among them As is evident here microsatellitedata can provide enough variation to reveal important path-ways We concur with Adrion et al (2014) however thatgenome-wide data (ie next generation sequencingapproaches) will be tremendously powerful in further dis-criminating among complex invasion scenarios (see alsoCristescu 2015 Estoup et al 2016) Because of the reducedcomputational resources demanded by ABC-RF this methodwill be particularly useful for analysis of massive single nucle-otide polymorphism datasets (Pudlo et al 2016) In additionto the selectively neutral demographic inference leading tothe reconstruction of routes of invasion population (andquantitative) next generation sequencing approaches arequite promising for studying the evolution of phenotypictraits in natural populations (Wray 2013 Gautier 2015)Specifically they can be used to better understand the geneticarchitecture of traits underlying invasion success (Bock et al2015) Drosophila suzukii is a good species for such researchgiven (i) its short generation time and viability in laboratoryconditions which facilitate experimental approaches to studyquantitative traits of interest (Asplen et al 2015) and (ii) theavailability of annotated genome assemblies for this species

(Chiu et al 2013 Ometto et al 2013) along with the hugeamount of genomic resources available in its close relativespecies D melanogaster (Groen and Whiteman 2016)

Materials and Methods

Sampling and GenotypingAdult D suzukii were sampled from the field at a total of 23localities (hereafter termed sample sites) distributed through-out most of the native and invasive range of the species(supplementary table S1 Supplementary Material onlineand fig 1) Samples were collected between 2013 and 2015using baited traps and sweep nets and stored in ethanolNative Asian samples consisted of a total of six sample sitesincluding four Chinese and two Japanese localities Samplesfrom the invasive range were collected in Hawaii (1 samplesite) Continental US (7 sites) Europe (7 sites) Brazil (1 site)and in the French island of La Reunion (1 site) For eachsample site 15ndash44 adult flies were genotyped at 25 of the28 microsatellite loci described in Fraimout et al (2015) re-sulting in a total of 685 genotyped individuals Three micro-satellite loci from Fraimout et al (2015) (ie loci DS31 DS42and DS45) were not included due to the presence of nullalleles at frequenciesgt 10 in some sample sites (resultsnot shown) DNA extraction and PCR amplification as wellas allele scoring were performed following the methodologydetailed in Fraimout et al (2015)

General Framework for Defining Focal PopulationGroups and Sets of Scenarios to Be ComparedUsing ABCOne of the many challenges of ABC analyses is to optimallydefine sets of sample sites to be processed chronologically aspotential sources and targets in a way that do not hamper thecomputational effort As stated in Estoup and Guillemaud(2010) and Lombaert et al (2014) the number and complex-ity of competing scenarios is proportional to the number ofgenetic groups to be accounted in ABC analyses FollowingLombaert et al (2014) we used the results obtained fromthree genetic clustering methods along with historical andgeographical information to define genetic groups that couldinclude several sample sites (supplementary appendix S1Supplementary Material online) We defined seven main ge-netic groups from our set of 23 sample sites Asia Hawaiiwestern US eastern US Europe Brazil and La Reunion Tomake the computations less intensive with respect to com-puter resources ABC treatments were carried out using onerepresentative sample site for each genetic group per replicateanalysis In other words when a genetic group included morethan two sample sites we considered the two most differen-tiated sample sites (ie with the highest FST values in supplementary table S3 Supplementary Material online) as repre-sentative of the group CN-Shi and JP-Tok for the group Asiaall three sample sites for the genetically heterogeneousWestern US group US-NC and US-Wis for the group easternUS GE-Dos and IT-Tre for the group Europe Each of thesample sites retained (ie the representative sample sites) wasincluded alternatively in the sample set considered for ABC

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

991Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 13: Deciphering the Routes of invasion of Drosophila suzukii

frequent in the worldwide invasion history of D suzukii Weidentified at least five admixture events for which sourceswere native andor invasive populations (ie A1ndashA5 eventsin fig 1) The genetic contributions of the sources were un-balanced except for the populations from Brazil and LaReunion but genetic information regarding admixture wasconsiderably less accurate in the latter cases which involvedweakly differentiated source populations

We have provided here the information necessary to eval-uate whether bottlenecks and admixture have influencedoutcomes in this invasion Quantitative genetics studies inthe lab focusing on the key source bottlenecked and ad-mixed populations could now examine the fitness and adap-tive potential of those groups (Lavergne and Molofsky 2007Facon et al 2008 Turgeon et al 2011)

Conclusions and PerspectivesUnderstanding invasion pathways provides key informationfor understanding the role of evolutionary processes in bio-logical invasions For example only does our research revealsources of invasive populations for further relevant compar-isons using quantitative genetics studies in the labAdditionally we found that the invasion of Europe by Dsuzukii was distinct from that of North America with limitedand asymmetrical gene flow between these two main invadedareas This situation provides the opportunity to evaluateevolutionary trajectories in replicate into two separate tem-perate climate regions similarly to research by Gilchrist et al(2001 2004) on the pace of clinal evolution in the invasivefruit fly Drosophila subobscura There are also distinct invasionpathways for areas with warmer climates (ie Hawaii LaReunion and Brazil) that will make interesting comparisons

Retracing invasion routes and making inferences aboutdemographic processes is only possible if there is adequatepolymorphism within populations and significant genetic dif-ferentiation among them As is evident here microsatellitedata can provide enough variation to reveal important path-ways We concur with Adrion et al (2014) however thatgenome-wide data (ie next generation sequencingapproaches) will be tremendously powerful in further dis-criminating among complex invasion scenarios (see alsoCristescu 2015 Estoup et al 2016) Because of the reducedcomputational resources demanded by ABC-RF this methodwill be particularly useful for analysis of massive single nucle-otide polymorphism datasets (Pudlo et al 2016) In additionto the selectively neutral demographic inference leading tothe reconstruction of routes of invasion population (andquantitative) next generation sequencing approaches arequite promising for studying the evolution of phenotypictraits in natural populations (Wray 2013 Gautier 2015)Specifically they can be used to better understand the geneticarchitecture of traits underlying invasion success (Bock et al2015) Drosophila suzukii is a good species for such researchgiven (i) its short generation time and viability in laboratoryconditions which facilitate experimental approaches to studyquantitative traits of interest (Asplen et al 2015) and (ii) theavailability of annotated genome assemblies for this species

(Chiu et al 2013 Ometto et al 2013) along with the hugeamount of genomic resources available in its close relativespecies D melanogaster (Groen and Whiteman 2016)

Materials and Methods

Sampling and GenotypingAdult D suzukii were sampled from the field at a total of 23localities (hereafter termed sample sites) distributed through-out most of the native and invasive range of the species(supplementary table S1 Supplementary Material onlineand fig 1) Samples were collected between 2013 and 2015using baited traps and sweep nets and stored in ethanolNative Asian samples consisted of a total of six sample sitesincluding four Chinese and two Japanese localities Samplesfrom the invasive range were collected in Hawaii (1 samplesite) Continental US (7 sites) Europe (7 sites) Brazil (1 site)and in the French island of La Reunion (1 site) For eachsample site 15ndash44 adult flies were genotyped at 25 of the28 microsatellite loci described in Fraimout et al (2015) re-sulting in a total of 685 genotyped individuals Three micro-satellite loci from Fraimout et al (2015) (ie loci DS31 DS42and DS45) were not included due to the presence of nullalleles at frequenciesgt 10 in some sample sites (resultsnot shown) DNA extraction and PCR amplification as wellas allele scoring were performed following the methodologydetailed in Fraimout et al (2015)

General Framework for Defining Focal PopulationGroups and Sets of Scenarios to Be ComparedUsing ABCOne of the many challenges of ABC analyses is to optimallydefine sets of sample sites to be processed chronologically aspotential sources and targets in a way that do not hamper thecomputational effort As stated in Estoup and Guillemaud(2010) and Lombaert et al (2014) the number and complex-ity of competing scenarios is proportional to the number ofgenetic groups to be accounted in ABC analyses FollowingLombaert et al (2014) we used the results obtained fromthree genetic clustering methods along with historical andgeographical information to define genetic groups that couldinclude several sample sites (supplementary appendix S1Supplementary Material online) We defined seven main ge-netic groups from our set of 23 sample sites Asia Hawaiiwestern US eastern US Europe Brazil and La Reunion Tomake the computations less intensive with respect to com-puter resources ABC treatments were carried out using onerepresentative sample site for each genetic group per replicateanalysis In other words when a genetic group included morethan two sample sites we considered the two most differen-tiated sample sites (ie with the highest FST values in supplementary table S3 Supplementary Material online) as repre-sentative of the group CN-Shi and JP-Tok for the group Asiaall three sample sites for the genetically heterogeneousWestern US group US-NC and US-Wis for the group easternUS GE-Dos and IT-Tre for the group Europe Each of thesample sites retained (ie the representative sample sites) wasincluded alternatively in the sample set considered for ABC

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

991Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 14: Deciphering the Routes of invasion of Drosophila suzukii

treatments This makes it possible to replicate analyses ofscenario choice on different sample sets representing mostof the genetic variation within and between genetic groupsAll ABC analyses described in table 1 were thus replicatedwith all possible combinations of representative sample sitesavailable for each genetic group This allowed us to test therobustness of scenario choices with respect to the sample setsconsidered as representative of genetic groups

Nested ABC Analyses Processed SequentiallyFollowing Lombaert et al (2014) we used historical informa-tion (ie dates of first observation of each invasive popula-tions supplementary table S1 Supplementary Materialonline) to define 11 sets of competing introduction scenariosthat were analyzed sequentially The first set of comparedscenarios considers the second oldest invasive populationas target (the oldest one necessarily originating from the na-tive range) and determines its introduction history Step bystep subsequent analyses use the results obtained from theprevious analyses until the most recent invasive populationsare considered Each of the 11 analyses included a set ofcompared introduction scenarios where the target invasivepopulation (ie the focal population) can either directly de-rive by a single split (ie introduction event) from one of thehistorically compatible source populations or be the result ofan admixture event between one of all possible combinationsof pairwise sources For instance in a model with one targetand two possible sources three scenarios could be formalizedwith the target population being derived from a single sourcepopulation from the other possible source or from an ad-mixture between the two sources (see supplementary fig S6for an illustration and supplementary table S2Supplementary Material online for details)

The first set of scenario choice analyses 1andash1d aimed atmaking inferences about the introduction pathways for thethree sample sites in western US (table 1) We first evaluatedthe Asian or Hawaiian or admixed AsianthornHawaiian origin ofeach three western US sample sites We then refined ourmodeling by formalizing seven competing scenarios describ-ing the possible relationships among the three western USsample sites and their extra-continental sources (ie Asia andHawaii analysis 1d) Once the invasion history was resolvedfor the western US group we sought to identify the popula-tion source(s) of the eastern US and European genetic groupsindependently Dates of first records in these two groupswidely overlap and the possibility that the western US wasa common source for both areas could not be ruled out Wetherefore compared in analysis 2a a set of six scenarios toassess whether the eastern US group directly derived fromAsia Hawaii or western US or if it resulted from an admixtureevent between one pair of such sources A similar set of sixscenarios was used to assess the most likely origin ofEuropean populations independently from the eastern USgroup (analysis 2b)

Due to the results of analyses 2a and 2b and due to theoverlap in first record dates for eastern US and Europe wethen performed two analyses to investigate the possibility ofgene flows (ie admixture) between the eastern US and

European groups considering eastern US as target populationwith Europe as potential source and vice versa (analyses 3aand 3b reciprocally) Analysis 4 aimed at clarifying the admix-ture in northern Europe sample sites identified in analysis 3bWe tested whether this admixture occurred between easternUS and a secondary introduction from Asia or between in-dividuals from eastern US and southern Europe Finally weinvestigated the origins of the most recent invasive popula-tions in Brazil and La Reunion separately with all older groupsand pairwise admixtures as potential sources (analyses 5aand 5b) Note that in agreement with the genetic clusteringresults (fig 1 supplementary appendix S1 and figs S1 and S2Supplementary Material online) we did not test for the pos-sibility of Brazil being the source of La Reunion andreciprocally

Historical Demographic and Mutational ParametersFor all scenarios formalized for ABC analyses we modeled anintroduction event as a divergence without subsequent gene-flow from the source population (or two source populationsin case of admixture) at a time corresponding to the date offirst record in the invaded area translated into a number ofgenerations (assuming 12 generation per year Lin et al 2014)The divergence event was immediately followed by a bottle-neck period characterized by a lower effective population sizeand a straight return to a stable (large) effective populationsize We modeled our uncertainty regarding the identity ofthe source populations in the slightly structured native Asianarea by including native ldquoghost populationsrdquo (ie unsampledpopulations) in our scenarios Such unsampled native popu-lations were modeled as a single split from an ancestral Asianpopulation (without any change in effective population size)at a time defined by a loose flat prior which includes zero atthe lower bound (supplementary table S8 SupplementaryMaterial online) See Lombaert et al (2014) and referencestherein for justifications of using unsampled populationswhen modeling invasion scenarios A detailed illustration ofthe above modeling design is provided in supplementary figS6 Supplementary Material online

Prior distributions for historical demographical and mu-tational parameters were defined taking into account thehistorical and demographic parameter values available fromempirical studies on D suzukii (Hauser 2011 Lin et al 2014Asplen et al 2015) and microsatellite mutational data onD melanogaster (Schug et al 1998) We considered a firstset of prior distributions (hereafter termed prior set 1) whichwas composed of rectangular (ie bounded uniforms) distri-butions (see supplementary table S8 Supplementary Materialonline) To evaluate the robustness of our ABC inferences toprior choice we also considered a second set of more peakedprior distributions (hereafter termed prior set 2 supplementary table S8 Supplementary Material online)

Model Choice Using ABC-RF and ABC-LDAFor all ABC-RF and ABC-LDA analyses we used the softwareDIYABC v210 (Cornuet et al 2014) to simulate datasetsconstituting the reference tables A reference table includesa given number of datasets that have been simulated for

Fraimout et al doi101093molbevmsx050 MBE

992Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 15: Deciphering the Routes of invasion of Drosophila suzukii

different scenarios using parameter values drawn from priordistributions each dataset being summarized with a pool ofstatistics

Following Pudlo et al (2016) ABC-RF treatments wereprocessed on reference tables including 10000 simulateddatasets per scenario Datasets were summarized using thewhole set of summary statistics proposed by DIYABC(Cornuet et al 2014) for microsatellite markers describinggenetic variation per population (eg number of alleles)per pair (eg genetic distance) or per triplet (eg admixturerate) of populations averaged over the 25 loci (see the supplementary table S9 Supplementary Material online for de-tails about such statistics) plus the linear discriminant anal-ysis (LDA) axes as additional summary statistics The totalnumber of summary statistics ranged from 39 to 424 depend-ing on the analysis (table 2) When the number of scenarios inan analysis exceeded ten we processed ABC-RF treatmentson reference tables including 100000 simulated datasets toavoid computer memory issues associated to the sub-bootstrapping procedure processed for reference tableswith more than 100000 simulated datasets see the sectionPractical recommendations regarding the implementation ofthe algorithms in Pudlo et al (2016) We checked that thisnumber was sufficient by evaluating the stability of prior errorrates (ie the probability to choose a wrong model whendrawing model index and parameter values into priors) andposterior probabilities estimations on 80000 90000 and100000 simulated datasets The number of trees in the con-structed random forests was fixed to nfrac14 500 see Pudlo et al2016 Breiman 2001 for justifications of considering a forest of500 trees and supplementary fig S7 Supplementary Materialonline for an illustration For each ABC-RF analysis we pre-dicted the best scenario estimated its posterior probabilitiesand prior error rates over 10 replicate runs of the same ref-erence table We used the abcrf R package (v110 Pudlo et al2016) to perform all ABC-RF analyses In supplementary appendix S2 Supplementary Material online we provide severalscripts written in R programming language (R DevelopmentCore Team 2008) for computing random forest analyses in Rwith the abcrf package when starting from simulated data-sets generated with DIYABC v210

For sake of methodological comparisons we also dupli-cated a subset of analyses using a more standard ABC method(Beaumont et al 2002 Cornuet et al 2008) improved usingthe regression algorithms from Estoup et al (2012) based onlinear discriminant analysis (LDA) to avoid the curse of di-mensionality and correlation among explanatory variables(ie multi-co-linearity) during the regression step Thismethod is called ABC-LDA Following previous analyses ofthis type (Lombaert et al 2014) ABC-LDA treatments wereprocessed on reference tables including 500000 simulateddatasets per scenario As for ABC-RF datasets were summa-rized using the whole set of summary statistics proposed byDIYABC (Cornuet et al 2014) Posterior probabilities associ-ated with each scenario were estimated by polychotomouslogistic regression (Cornuet et al 2008) modified followingEstoup et al (2012) on the 1 of the simulated datasetsclosest to the observed dataset We also estimated for each

analysis a prior error rate over 500 simulated datasets drawninto priors using reference tables including 500000 simulateddatasets per scenario To provide a fair comparison of classi-fication error with respect to the computational effort priorerror rates were also estimated using reference tables includ-ing the same number of simulated datasets per scenario as forABC-RF treatments (10000 per scenario)

Refining the Origins of the Primary IntroductionEvents from the Native AreaWe refined our worldwide invasion scenario by assessingthe most probable origin of the primary introductionsfrom the Asian native range To this aim we carried outadditional ABC-RF treatments to choose which geograph-ical area in the native range was the most likely origin ofeach of the three primary introductions from Asia iden-tified in the final invasion scenario ie in Hawaii westernUS (sample site US-Wat) and Europe (fig 1) Capitalizingon the observed pattern of genetic differentiation amongall Asian sample sites we considered four possible geo-graphical origins O1frac14 Japan (represented by the samplesites JP-Sapthorn Jp-Tok that were pooled as they did notshow any significant differentiation) O2frac14 South-EastChina (represented by the sample site CN-Nin)O3frac14North-West China (represented by the sample sitesCN-LanthornCN-Lia that were pooled as they did not showany significant differentiation) and O4frac14 South-WestChina (represented by the sample site CN-Shi) In eachof these three model choice analyses (cf one analysis perprimary introduction) we considered the previously in-ferred introduction histories and compared four scenar-ios corresponding to variation of those histories that onlydiffered by the geographical origin of the primary Asianintroduction (models O1 O2 O3 and O4) We used theABC-RF method with 10000 datasets per model simu-lated using the prior set 1 or 2 and with three replicationsof the RF process For inferences regarding Europe wecarried out ABC-RF treatments on various sets of samplesites from Europe and eastern US to further evaluate therobustness of our results

Parameter EstimationFor ABC parameter estimation we considered a set of 12 keysample sites representative of the inferred final invasion his-tory (fig 1) This set of sample sites included the three nativeorigins of primary introductions from the native range (O1O2 and O3) the Hawaiian sample (US-Haw) the three west-ern US samples (US-Wat US-Sok and US-SD) the eastern USsample US-NC two European samples (IT-Tre and GE-Dos)the Brazilian sample (BR-PA) and the sample from LaReunion (FR-Reu) To evaluate the robustness of our param-eter estimations with respect to sample sites we carried out asecond analysis replacing the sample sites US-NC IT-Tre andGE-Dos by US-Wis SP-Bar and FR-Par respectively We alsoreplicated parameter estimations using the prior set 1 and theprior set 2

ABC-RF was developed to tackle the inferential issue ofmodel choice So far no RF-based solution exists to estimate

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

993Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 16: Deciphering the Routes of invasion of Drosophila suzukii

parameter distributions under a given model (Pudlo et al2016) Standard ABC algorithms for parameter estimationmay suffer from the curse of dimensionality and correlationamong explanatory variables (ie multi-co-linearity) duringthe regression step and hence yield poor results when thenumber of statistics is much too large [reviewed in Blum et al(2013)] To avoid such potential problems as well as for thesake of simplicity and computation efficiency (ie statisticaltechniques for choosing summary statistics have not beenimplemented in DIYABC Blum et al 2013) we used a stan-dard ABC methodological framework (Beaumont et al 2002Cornuet et al 2008) applied to a subset of ldquoexpert-chosenrdquostatistics proposed in DIYABC (Brouat et al 2014 Lombaertet al 2014) to estimate posterior parameter distributions un-der the final invasion scenario We hence considered 95 sum-mary statistics in total the mean number of alleles and themean genetic diversity (per locus and population sample) allpairwise FSTrsquos and some crude estimates of admixture ratesbased on the five population triplets for which admixtureevents have been inferred (ie A1ndashA5 events described infig 1 corresponding to five AML statistics Choisy et al2004 see the DIYABC 210 user-manual p16 for details aboutsuch statistics) Using parameter values drawn from the priorsets 1 or 2 we produced a reference table containing 10million simulated datasets Following Beaumont et al(2002) we then used a local linear regression to estimatethe parameter posterior distributions We took the 10000(100) simulated datasets closest to our observed dataset

for the regression after applying a logit transformation toparameter values We found that considering instead the1 closest simulated datasets andor a larger sets of summarystatistics had only weak effects on the estimated parameterdistributions at least for the subset of parameters we wereinterested in (ie admixture rates and bottleneck severity)(results not shown)

We focused our investigations on the parameters relatedto two types of demographic events that are considered to bepotentially important drivers of invasion success severity ofthe bottleneck associated with the foundation of new popu-lations and genetic admixture (Estoup et al 2016) Bottleneckseverity was defined as the ratio between the duration of thebottleneck and the number of effective individuals during thisperiod (Pascual et al 2007) Following the final invasion sce-nario detailed in figure 1 we defined three categories of in-troduction situations bottleneck severity for populationsfounded only by individuals originating from a different con-tinent (eg sample sites US-Haw US-Wat IT-Tre BR-PA andFR-Reu) bottleneck severity for populations founded only byindividuals originating from the same continent (eg samplesites US-SD and US-NC) and bottleneck severity for popula-tions founded by a combination of the two types of sources(eg sample sites US-Sok and GE-Dos) Genetic admixture isevident when a population is inferred to have more than onedistinct source (A1ndashA5 events in fig 1)

Model-Posterior CheckingWe used the ABC model-posterior checking method imple-mented in DIYABC (Cornuet et al 2010) to evaluate how well

the final worldwide invasion scenario with associated param-eter posterior distributions matches with the observed data-set This method was largely inspired by statistical frameworksdetailed in Gelman et al (2003) and Cook et al (2006) Theprinciple is as follows if a scenario-posterior combination fitsthe observed data correctly then data simulated under thisscenario with parameters drawn from associated posteriordistributions should be close to the observed data The lackof fit of the model to the data can be measured by determin-ing the frequency at which test quantities measured on theobserved dataset are extreme with respect to the distribu-tions of the same test quantities computed from the simu-lated datasets (ie the posterior predictive distributions) Inpractice the test quantities are chosen among the large set ofABC summary statistics proposed in the program DIYABCFor each test quantities (q) a lack of fit of the observed datawith respect to the posterior predictive distribution can bemeasured by the cumulative distribution function values de-fined as Prob(qsimulatedltqobserved) Tail-area probability canbe easily computed for each test quantityasProb(qsimulatedltqobserved) and 10 Prob (qsimulatedltqobserved)for Prob (qsimulatedltqobserved) 05 andgt 05 respectivelySuch tail-area probabilities also named posterior predic-tive P-values (ie ppp-values Meng 1994) represent theprobabilities that the replicated data (simulated ABC sum-mary statistics) could be more extreme than the observeddata (observed ABC summary statistics) In practice ppp-values can be interpreted as a guideline for tracking model-posterior misfit a few ppp-values around 005 are not nec-essarily a problem whereas the presence of ppp-valuesaround 0001 is rather suspect Hence too many observedsummary statistics falling in the tails of distributions espe-cially if some of those ppp-values are small (lt0001) castserious doubts on the adequacy of the model-posteriorcombination to the observed dataset Finally becauseppp-values are computed for a number of (often non-independent) test statistics a method such as that ofBenjamini and Hochberg (1995) can be used to controlthe false discovery rate (Cornuet et al 2010)

We carried out the ABC model-posterior checking analysison our observed microsatellite dataset as follows From 107

datasets simulated under the final invasion scenario detailedin figure 1 and using the same subset of 95 statistics previouslyretained for parameter estimation we obtained a posteriorsample of 104 values from the posterior distributions of pa-rameters as described in the previous section Parameter es-timation We then simulated 104 datasets with parametervalues drawn with replacement from this posterior sampleAs underlined in many text books in statistics (Gelman et al2003) it is advised against performing model checking usinginformation that have already been used for training (iemodel fitting see also Cornuet et al 2010 for illustrationson simulated datasets) Optimally model-posterior checkingshould be based on test quantities that do not correspond tothe summary statistics which have been used for obtainingthe parameters posterior distributions This is naturally pos-sible with DIYABC as the package propose a large choice ofsummary statistics Our set of test statistics therefore

Fraimout et al doi101093molbevmsx050 MBE

994Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 17: Deciphering the Routes of invasion of Drosophila suzukii

included the 1141 remaining single pairwise and three-population summary statistics available in DIYABC thatwere not used for previous ABC parameter estimation

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsAF was supported by state funding by the Agence Nationalede la Recherche through the LabEx ANR-10-LABX-0003-BCDiv of the program ldquoInvestissements drsquoavenirrdquo (ANR-11-IDEX-0004-02) AE and RAH were supported by theLanguedoc Roussillon region (France) through theEuropean Union program FEFER FSE IEJ 2014-2020 (projectCEPADROL) the INRA scientific department SPE (AAP-SPE2016) and the French Agropolis Fondation (Labex Agro-Montpellier) through the AAP ldquoInternational Mobilityrdquo (CfP2015-02) GL acknowledges support from Cornell UniversityrsquosNew York State Agricultural Experiment Station federal for-mula funds project 2012-13-195 XC MK IM and JZ werefunded by the European Unionrsquos 7th Framework Programmeunder grant agreement No 613678 (DROPSA) MD acknowl-edges support from CAPES (Coordenac~ao deAperfeicoamento de Pessoal de Nıvel Superior) and CNPq(Conselho Nacional de Desenvolvimento Cientifico eTecnologico) MP was supported by project CTM2013-48163 The authors acknowledge Julio Pedraza and theUMS 2700 OMSI for computational support Moleculardata used in this work were produced through the genotyp-ing and sequencing facilities of ISEM (Institut des Sciences delrsquoEvolution-Montpellier) and Labex CEMEB (CentreMediterraneen Environnement Biodiversite) The authorsthank Masanori Toda for his help in D suzukii sampling

ReferencesAdrion JR Kousathanas A Pascual M Burrack HJ Haddad NM Bergland

AO Machado H Sackton TB Schlenke TA Watada M et al 2014Drosophila suzukii the genetic footprint of a recent worldwide in-vasion Mol Biol Evol 313148ndash3163

Asplen MK Anfora G Biondi A Choi DS Chu D Daane KM Gibert PGutierrez AP Hoelmer KA Hutchison WD et al 2015 Invasionbiology of spotted wing Drosophila (Drosophila suzukii) a globalperspective and future priorities J Pest Sci 88469ndash494

Atallah J Teixeira L Salazar R Zaragoza G Kopp A 2014 The making of apest the evolution of a fruit-penetrating ovipositor in Drosophilasuzukii and related species Proc R Soc B 28120132840ndash20132840

Beaumont MA Zhang W Balding DJ 2002 Approximate Bayesian com-putation in population genetics Genetics 1622025ndash2035

Beaumont MA 2010 Approximate Bayesian computation in evolutionand ecology Annu Rev Ecol Evol Syst 41379ndash406

Benjamini Y Hochberg Y 1995 Controlling the false discovery rate apractical and powerful approach to multiple testing J Roy Stat Soc B57289ndash300

Bertorelle G Benazzo A Mona S 2010 ABC as a flexible framework toestimate demography over space and time some cons many prosMol Ecol 192609ndash2625

Blum M Nunes M Prangle D Sisson S 2013 A comparative review ofdimension reduction methods in approximate Bayesian computa-tion Stat Sci 28189ndash208

Bock DG Caseys C Cousens RD Hahn MA Heredia SM Hubner STurner KG Whitney KD Rieseberg LH 2015 What we still donrsquotknow about invasion genetics Mol Ecol 242277ndash2297

Boissin E Hurley B Wingfield MJ Vasaitis R Stenlid J Davis C de Groot PAhumada R Carnegie A Goldarazena A et al 2012 Retracing theroutes of introduction of invasive species the case of the Sirexnoctilio woodwasp Mol Ecol 215728ndash5744

Breiman L 2001 Random forests Mach Learn 455ndash32Benazzo A Ghirotto S Torres Vilaca ST Hoban S 2015 Using ABC and

microsatellite data to detect multiple introductions of invasive spe-cies from a single source Heredity 115262ndash272

Brouat C Tollenaere C Estoup A Loiseau A Sommer S SoanandrasanaR Rahalison L Rajerison M Piry S Goodman SM Duplantier JM2014 Invasion genetics of a human commensal rodent the black ratRattus rattus in Madagascar Mol Ecol 23(16)4153ndash4167

Caprio M Tabashnik BE 1992 Gene flow accelerates local adaptationamong finite populations simulating the evolution of insecticideresistance J Econ Entomol 85611ndash620

Calabria G Maca J Beuroachli G Serra L Pascual M 2012 First records of thepotential pest species Drosophila suzukii (Diptera Drosophilidae) inEurope J Appl Entomol 136139ndash147

Chiu JC Jiang XT Zhao L Hamm CA Cridland JM Saelao P Hamby KALee EK Kwok RS Zhang G Zalom FG 2013 Genome of Drosophilasuzukii the spotted wing drosophila G3 32257ndash2271

Choisy M Franck P Cornuet JM 2004 Estimating admixture propor-tions with microsatellites comparison of methods based on simu-lated data Mol Ecol 13955ndash968

Cook S Gelman A Rubin DB 2006 Validation of software forBayesian models using posterior quantiles J Comput GraphStat 15675ndash692

Cornuet JM Santos F Beaumont MA Robert CP Marin JM Balding DJGuillemaud T Estoup A 2008 Inferring population history withDIYABC a user-friendly approach to approximate Bayesian compu-tation Bioinformatics 242713ndash2719

Cornuet JM Ravigne V Estoup A 2010 Inference on populationhistory and model checking using DNA sequence and micro-satellite data with the software DIYABC (v10) BMCBioinformatics 11401

Cornuet JM Pudlo P Veyssier J Dehne-Garcia A Gautier M Leblois RMarin JM Estoup A 2014 DIYABC v20 a software to make approx-imate Bayesian computation inferences about population historyusing single nucleotide polymorphism DNA sequence and micro-satellite data Bioinformatics 301187ndash1189

Cristescu ME 2015 Genetic reconstructions of invasion history Mol Ecol242212ndash2225

Depra M Poppe JL Schmitz HJ De Toni DC Valente VLS 2014 The firstrecords of the invasive pest Drosophila suzukii in the SouthAmerican continent J Pest Sci 87379ndash383

Dlugosch KM Parker IM 2008 Founding events in species invasionsgenetic variation adaptive evolution and the role of multiple intro-ductions Mol Ecol 17431ndash449

Dlugosch KM Anderson SR Braasch J Alice Cang F Gillette H 2015 Thedevil is in the details genetic variation in introduced populationsand its contributions to invasion Mol Ecol 242095ndash2111

Edmonds CA Lillie AS Cavalli-Sforza LL 2004 Mutations arising in thewave front of an expanding population Proc Natl Acad Sci USA101975ndash979

Ellstrand NC Schierenbeck KA 2000 Hybridization as a stimulus for theevolution of invasiveness in plants Proc Nat Acad Sci USA977043ndash7050

Estoup A Guillemaud T 2010 Reconstructing routes of invasion usinggenetic data why how and so what Mol Ecol 194113ndash4130

Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CPCornuet JM 2012 Estimation of demo-genetic model probabilitieswith Approximate Bayesian Computation using linear discriminantanalysis on summary statistics Mol Ecol Res 12(5)846ndash855

Estoup A Ravigne V Hufbauer RA Vitalis R Gautier M Facon B 2016 Isthere a genetic paradox of biological invasion Annu Rev Ecol EvolSyst 4751ndash72

Deciphering Invasion Routes Using ABC Random Forest doi101093molbevmsx050 MBE

995Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4
Page 18: Deciphering the Routes of invasion of Drosophila suzukii

Facon B Genton B Shykoff J Jarne P Estoup A David P 2006 A generaleco-evolutionary framework for understanding bioinvasions TrendsEcol Evol 21130ndash135

Facon B Pointier JP Jarne P Sarda V David P 2008 High genetic variancein life-history strategies within invasive populations by way of mul-tiple introductions Curr Biol 18363ndash367

Fraimout A Loiseau A Price DK Xuereb A Martin JF Vitalis R Fellous SDebat V Estoup A 2015 New set of microsatellite markers for thespotted-wing Drosophila suzukii (Diptera Drosophilidae) a promis-ing molecular tool for inferring the invasion history of this majorinsect pest Eur J Entomol 112(4)855

Gautier M 2015 Genome-wide scan for adaptive divergence and asso-ciation with population-specific covariates Genetics 2011555ndash1579

Gelman A Carlin JB Stern HS Rubin DB 2003 Bayesian data analysisLondon Chapman amp HallCRC

Gilchrist GW Huey RB Serra L 2001 Rapid evolution of wing size clinesin Drosophila subobscura Genetica 112273ndash286

Gilchrist GW Huey RB Balanya J Pascual M Serra L Noor M 2004 Atime series of evolution in action a latitudinal cline in wing size inSouth American Drosophila subobscura Evolution 58768ndash780

Groen SC Whiteman NK 2016 Using Drosophila to study the evolutionof herbivory and diet specialization Curr Opin Insect Sci 1466ndash72

Hauser M 2011 A historic account of the invasion of Drosophila suzukii(Matsumura) (Diptera Drosophilidae) in the continental UnitedStates with remarks on their identification Pest Manag Sci671352ndash1357

Hufbauer RA Facon B Ravigne V Turgeon J Foucaud J Lee CE Rey OEstoup A 2012 Anthropogenically induced adaptation to invade(AIAI) contemporary adaptation to human-altered habitats withinthe native range can promote invasions Evol Appl 589ndash101

Kaneshiro KY 1983 Drosophila (Sophophora) suzukii (Matsumura) ProcHawaiian Entomol Soc 24179

Keller SR Taylor DR 2010 Genomic admixture increases fitness during abiological invasion admixture increases fitness during invasion J EvolBiol 231720ndash1731

Kolbe JJ Glor RE Schettino LR Lara AC Larson A Losos JB 2004 Geneticvariation increases during biological invasion by a Cuban lizardNature 431177ndash181

Lavergne S Molofsky J 2007 Increased genetic variation and evolution-ary potential drive the success of an invasive grass Proc Natl Acad SciUSA 1043883ndash3888

Lee C 2002 Evolutionary genetics of invasive species Trends Ecol Evol17386ndash391

Lee JC Bruck DJ Curry H Edwards D Haviland DR Van Steenwyk RAYorgey BM 2011 The susceptibility of small fruits and cherries to thespotted-wing drosophila Drosophila suzukii Pest Manag Sci671358ndash1367

Lenormand T 2002 Gene flow and the limits of natural selection TrendsEcol Evol 17183ndash189

Lin QC Zhai YF Zhang AS Men XY Zhang XY Zalom FG Zhou CG YuY 2014 Comparative Developmental Times and Laboratory Lifetables for Drosophlia suzukii and Drosophila melanogaster(Diptera Drosophilidae) Fla Entomol 971434ndash1442

Lombaert E Guillemaud T Lundgren J Koch R Facon B Grez ALoomans A Malausa T Nedved O Rhule E et al 2014

Complementarity of statistical treatments to reconstruct worldwideroutes of invasion the case of the Asian ladybird Harmonia axyridisMol Ecol 235979ndash5997

Matsumura S 1931 6000 illustrated insects of Japan-empire (inJapanese) Toko shoin Tokyo 1497 pp

Meng XL 1994 Posterior predictive p-values Ann Stat 221142ndash1160Nei M Maruyama T Chakraborty R 1975 The bottleneck effect and

genetic variability in populations Evolution 291ndash10Ometto L Cestaro A Ramasamy S Grassi A Revadi S Siozios S Moretto

M Fontana P Varotto C Pisani D et al 2013 Linking genomics andecology to investigate the complex evolution of an invasive drosoph-ila pest Genome Biol Evol 5745ndash757

Pascual M Chapuis MP Mestres F Balanya J Huey RB Gilchrist GWSerra L Estoup A 2007 Introduction history of Drosophila subobs-cura in the New World a microsatellite based survey using ABCmethods Mol Ecol 163069ndash3083

Peischl S Excoffier L 2015 Expansion load recessive mutations and therole of standing genetic variation Mol Ecol 242084ndash2094

Pudlo P Marin JM Estoup A Cornuet JM Gautier M Robert CP 2016Reliable ABC model choice via random forests Bioinformatics32859ndash866

R Development Core Team 2008 R A language and environment forstatistical computing R Foundation for Statistical ComputingVienna Austria httpwwwR-projectorg

Robert CP Cornuet JM Marin JM Pillai NS 2011 Lack of confidence inapproximate Bayesian computation model choice Proc Natl AcadSci USA 10815112ndash15117

Rius M Darling JA 2014 How important is intraspecific genetic admix-ture to the success of colonising populations Trends Ecol Evol29233ndash242

Sakai AK Allendorf FW Holt JS Lodge DM Molofsky J With KABaughman S Cabin RJ Cohen JE Ellstrand NC McCauley DE2001 The population biology of invasive species Annu Rev EcolSyst 305ndash332

Schug MD Hutter CM Noor MA Aquadro CF 1998 Mutation andevolution of microsatellites in Drosophila melanogaster in Mutationand Evolution Springer pp 359ndash367

Simberloff D 2013 Biological invasions much progress plus several con-troversies Contrib Sci 97ndash16

Turgeon J Tayeh A Facon B Lombaert E De Clercq Berkvens N LungrenJG Estoup A 2011 Experimental evidence for the phenotypic im-pact of admixture between wild and biocontrol Asian ladybird(Harmonia axyridis) involved in the European invasion J Evol Biol241044ndash1052

Uller T Leimu R 2011 Founder events predict changes in genetic diver-sity during human-mediated range expansions Glob Change Biol173478ndash3485

Verdu P Austerlitz F Estoup A Vitalis R Georges M Thery S Froment ALe Bomin S Gessain A Hombert JM et al 2009 Origins and geneticdiversity of pygmy hunter-gatherers from Western Central AfricaCurr Biol 19312ndash318

Wray G 2013 Genomics and the evolution of phenotypic Traits AnnuRev Ecol Evol Syst 4451ndash72

Fraimout et al doi101093molbevmsx050 MBE

996Downloaded from httpsacademicoupcommbearticle-abstract3449802953204by Universite De La Reunion useron 03 July 2018

  • msx050-TF1
  • msx050-TF2
  • msx050-TF3
  • msx050-TF4