spatial population genetics: geographical genetics by bryan k. epperson. princeton university press,...

2
Spatial population genetics Geographical Genetics by Bryan K. Epperson. Princeton University Press, 2003. US$39.95 £26.95 pbk (376 pages) ISBN 0 691 08669 9 Ian Wilson Department of Mathematical Sciences, University of Aberdeen, Aberdeen, UK, AB24 3UE The spatial element in population gen- etics is crucial to many population genetic processes. Even with samples from a single population, spatial pro- cesses are likely to have left signatures. However, analyses are difficult. Before we consider location, analyzing patterns of genetic variation within a population is challenging. Genetic data ‘consist of a high dimensional, but partial, snapshot, taken at a single point in time from the evolution of a complicated stochastic process’ [1]. The highly structured nature of this data makes modeling processes that influence genetic vari- ation, such as drift, population expansions, bottlenecks, and the various selection and mutation regimes, difficult. Adding to these modeling difficulties is a statistical problem. Increasing the sample size adds little infor- mation because new data points are likely to share recent ancestry with those already in the sample [1]. Blending geographical information in this mix intensi- fies modeling problems, but gives us the opportunity for more statistical ‘power’. With limited gene flow, neigh- bouring populations are likely to share more recent ancestry than are distant populations, and this association should be incorporated into our analyses. Equilibrium models of continuous populations are difficult to work with analytically [2], and even these models are insufficient when we consider barriers to gene flow, climate changes, extinction, recolonization, fission, fusion, range expansion and selection on an environmental gradient. However, spatial information is crucial if we want to know about processes that have a spatial element. Furthermore, additional information gives us the opportunity to learn more through spatial replication. Intuitively, if popu- lations are sufficiently different then they can be thought of as replicates, (although population genetic theory teaches that gene flow has to be extremely low for independence between populations). At the other extreme, with high levels of dispersal between populations then the genealogical tree will look very like that for a single panmictic population, and increases in sample size add little information. The difficulties come with intermediate levels where most species live and where this monograph concentrates. In Geographical Genetics, Epperson covers a wide range of techniques for the analysis of genetic data where there is a spatial element, making use of population genetic modeling and the (underused) application of spatial statistical techniques to population genetic data. The rigorous approach taken gives some very valuable insights, which should be very helpful to practitioners. Most weight is given to studies of gene frequency data, although this is understandable because, with multiple loci, it becomes possible to learn about spatial parameter values with precision (the index gives only one entry for DNA: DNA sequence data – information contained in). Discussions of haplotype and sequence data and methods for their analysis were largely missing, except for a mildly critical discussion of nested clade analysis [3], and the hope – shared by many – that ancient DNA will give us more information about population genetic processes. Absent is a discussion of using new statistical tech- niques to study spatial variation. Recent increases in computer power have enabled the development of full probability models for the analysis of population genetic data. The basis of these techniques is to model the correlations between data points using the genealogical tree behind a sample and to integrate numerically over all possible trees that could give rise to the data, modeling this tree using the coalescent or one of its extensions. Analytical techniques used include importance sampling [1], Markov chain Monte Carlo [4,5] and rejection sampling methods based on summary statistics [6,7], and are either within the classic likelihood statistical framework [4] or are explicitly Bayesian [1,5–7]. These methods have been used to attack problems of geographi- cal genetics [4,5,7]. Statistical models for the analysis of genetic data with explicit spatial location are more difficult, but can be constructed (see contributed discus- sion of [5]). These techniques are not just restricted to DNA sequence data; multi-locus microsatellite data can also be analyzed [7]. Another omission is any mention of the Bayesian approach to statistical inference, which is commonly used in statistical genetics [8]. Because of the limited information contained in genetic data, it is often necessary to use prior information. The author recognises the usefulness of a priori information, but does not men- tion analytical techniques that incorporate it in a consistent framework. Nevertheless, this book provides a detailed overview of incorporating geographical infor- mation into population genetic theory and most of the important methods for the analysis of spatially structured genetic data. References 1 Stephens, M. and Donnelly, P. (2000) Inference in molecular population genetics. J. R. Stat. Soc. Ser. B 62, 605–655 Corresponding author: Ian Wilson ([email protected]). Update TRENDS in Ecology and Evolution Vol.19 No.5 May 2004 229 www.sciencedirect.com

Upload: ian-wilson

Post on 29-Oct-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spatial population genetics: Geographical Genetics by Bryan K. Epperson. Princeton University Press, 2003. US$39.95 £26.95 pbk (376 pages) ISBN 0 691 08669 9

Spatial population geneticsGeographical Genetics by Bryan K. Epperson. Princeton University Press, 2003. US$39.95 £26.95 pbk (376 pages) ISBN 0 691 08669 9

Ian Wilson

Department of Mathematical Sciences, University of Aberdeen, Aberdeen, UK, AB24 3UE

The spatial element in population gen-etics is crucial to many populationgenetic processes. Even with samplesfrom a single population, spatial pro-cesses are likely to have left signatures.However, analyses are difficult. Beforewe consider location, analyzing patternsof genetic variation within a populationis challenging. Genetic data ‘consist of a

high dimensional, but partial, snapshot, taken at a singlepoint in time from the evolution of a complicated stochasticprocess’ [1]. The highly structured nature of this datamakes modeling processes that influence genetic vari-ation, such as drift, population expansions, bottlenecks,and the various selection and mutation regimes, difficult.Adding to these modeling difficulties is a statisticalproblem. Increasing the sample size adds little infor-mation because new data points are likely to share recentancestry with those already in the sample [1].

Blending geographical information in this mix intensi-fies modeling problems, but gives us the opportunity formore statistical ‘power’. With limited gene flow, neigh-bouring populations are likely to share more recentancestry than are distant populations, and this associationshould be incorporated into our analyses. Equilibriummodels of continuous populations are difficult to work withanalytically [2], and even these models are insufficientwhen we consider barriers to gene flow, climate changes,extinction, recolonization, fission, fusion, range expansionand selection on an environmental gradient. However,spatial information is crucial if we want to know aboutprocesses that have a spatial element. Furthermore,additional information gives us the opportunity to learnmore through spatial replication. Intuitively, if popu-lations are sufficiently different then they can be thoughtof as replicates, (although population genetic theoryteaches that gene flow has to be extremely low forindependence between populations). At the other extreme,with high levels of dispersal between populations then thegenealogical tree will look very like that for a singlepanmictic population, and increases in sample size addlittle information. The difficulties come with intermediatelevels – where most species live – and where thismonograph concentrates.

In Geographical Genetics, Epperson covers a wide rangeof techniques for the analysis of genetic data where there isa spatial element, making use of population geneticmodeling and the (underused) application of spatial

statistical techniques to population genetic data. Therigorous approach taken gives some very valuableinsights, which should be very helpful to practitioners.Most weight is given to studies of gene frequency data,although this is understandable because, with multipleloci, it becomes possible to learn about spatial parametervalues with precision (the index gives only one entry forDNA: DNA sequence data – information contained in).Discussions of haplotype and sequence data and methodsfor their analysis were largely missing, except for a mildlycritical discussion of nested clade analysis [3], and thehope – shared by many – that ancient DNA will give usmore information about population genetic processes.

Absent is a discussion of using new statistical tech-niques to study spatial variation. Recent increases incomputer power have enabled the development of fullprobability models for the analysis of population geneticdata. The basis of these techniques is to model thecorrelations between data points using the genealogicaltree behind a sample and to integrate numerically over allpossible trees that could give rise to the data, modeling thistree using the coalescent or one of its extensions.Analytical techniques used include importance sampling[1], Markov chain Monte Carlo [4,5] and rejectionsampling methods based on summary statistics [6,7],and are either within the classic likelihood statisticalframework [4] or are explicitly Bayesian [1,5–7]. Thesemethods have been used to attack problems of geographi-cal genetics [4,5,7]. Statistical models for the analysis ofgenetic data with explicit spatial location are moredifficult, but can be constructed (see contributed discus-sion of [5]). These techniques are not just restricted to DNAsequence data; multi-locus microsatellite data can also beanalyzed [7].

Another omission is any mention of the Bayesianapproach to statistical inference, which is commonlyused in statistical genetics [8]. Because of the limitedinformation contained in genetic data, it is often necessaryto use prior information. The author recognises theusefulness of a priori information, but does not men-tion analytical techniques that incorporate it in aconsistent framework. Nevertheless, this book provides adetailed overview of incorporating geographical infor-mation into population genetic theory and most of theimportant methods for the analysis of spatially structuredgenetic data.

References

1 Stephens, M. and Donnelly, P. (2000) Inference in molecular populationgenetics. J. R. Stat. Soc. Ser. B 62, 605–655Corresponding author: Ian Wilson ([email protected]).

Update TRENDS in Ecology and Evolution Vol.19 No.5 May 2004 229

www.sciencedirect.com

Page 2: Spatial population genetics: Geographical Genetics by Bryan K. Epperson. Princeton University Press, 2003. US$39.95 £26.95 pbk (376 pages) ISBN 0 691 08669 9

2 Barton, N.H. and Wilson, I.J. (1996) Genealogy and geography. In NewUses for New Phylogenies (Harvey, P.H. et al., eds), pp. 23–56, OxfordUniversity Press

3 Templeton, A.R. (1998) Nested clade analyses of phylogeographic data:testing hypotheses about gene flow and population history. Mol. Ecol. 7,381–397

4 Beerli, P. and Felsenstein, J. (1999) Maximum likelihood estimation formigration rates and effective population numbers in two populationsusing a coalescent approach. Genetics 152, 763–773

5 Wilson, I.J. et al. (2003) Inferences from DNA data: population histories,evolutionary processes and forensic match probabilities. J. R. Stat. Soc.Ser. A 166, 155–201

6 Beaumont, M.A. et al. (2003) Approximate Bayesian Computation inPopulation Genetics. Genetics 162, 2025–2035

7 Estoup, A. et al. (2001) Inferring population history from microsatelliteand enzyme data in serially introduced cane toads, Bufo marinus.Genetics 159, 1671–1687

8 Shoemaker, J.S. et al. (1998) Bayesian statistics in genetics: a guide forthe uninitiated. Trends Genet. 15, 354–358

0169-5347/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.doi:10.1016/j.tree.2004.01.007

Overseen but not forgottenThe Principles of Life by Tibor Ganti. Oxford University Press, 2003. £55.00 hbk (220 pages) ISBN 0 19 850726 7

Bill Martin

Institut fur Botanik III, Heinrich-Heine Universitat, Universitatstr.1, 40225 Gebaude 26.13.01, Dusseldorf, Germany

The Principles of Life is not likely tobecome a best-seller soon, but for folkswho ponder the seemingly intractablequestion of how fully fledged livingsystems could have emerged from inan-imate matter, the book is a highlyrecommended read. Ganti introduceshis chemoton model, a reductionist

distillate of the attributes to be demanded of the simplest(hence in evolutionary terms, first) living systems. In anutshell, the chemoton model summarizes the threecriteria that a minimal, self-sustaining biological systemmust fulfill: (i) it must function under the direction of aprogram; (ii) it must reproduce itself; and (iii) it and itsprogeny must be separate from the environment. Theseproperties are translated into nicely modelable mathemat-ical quantities, and out pops something similar to achemical engineer’s flow reactor that (given properchemical substrates) makes more of itself, just like cellsdo today. At first sight, this might not seem too surprising.The surprise comes from the fact that the chemoton modelwas fully elaborated in 1971, in a book of nearly the sametitle published in Hungarian.

Why popularize a 33-year old model? With all theadvances from molecular biology, the discovery of catalyticRNA, and the improvements in genome sequencingtechnology that have since been made, hasn’t thechemoton been superceded by vastly better models?From my standpoint, the answer is ‘no, not really’. Gantiput his finger squarely on a sore spot of many currenttheories about the origin of living systems, which tend tofocus either on self-replicating catalytic RNA (Ganti’scriteria i and ii, alias the ‘RNA world’) [1] or onspontaneously forming lipid vesicles that serve to modela basic property of all cells, which are always surroundedby a membrane (Ganti’s criterion iii, alias the ‘lipid world’)[2]. Ganti’s argument is that all three criteria have to come

together right from the start in an interacting system if lifeis ever to get off the ground.

Perhaps the most radical and demanding aspect ofGanti’s chemoton is the view that the first freely self-replicating systems must have produced their contain-ment from the environment (their membrane) all bythemselves. It is hard for biologists to imagine how aprimitive system that is busy with problems asfundamental as making a copy of its own program(RNA replication) would have the time or energy toworry about making a containment for itself, but thisis where the virtues of the chemoton come into play,because using robust logic and straightforward reduc-tionist principles Ganti arrives at the conclusion that,without compartmentation from the environment fromthe onset, there is basically no way to get to theorganizational level that we today recognize as thecommon unit of all life: cells.

At the first read, the text seems stiff and somewhatcumbersome and inaccessible in spots, as if it were writtenby an engineer instead of by a biologist. But Ganti was anengineer, a chemical engineer. Accordingly, the biologicallyoriented reader has to make a willful interdisciplinaryeffort to understand the thoughts of an author whosebackground and mindset at the time the ideas wereformulated (Cold War Hungary) could hardly differ morefrom our present intellectual environment. Additionalchapters contributed by Eors Szathmary and JamesGriesemer are very helpful, because they put the incisivenovelty of Ganti’s contributions in the proper historical-conceptual context and point out clearly the ways in whichhis ideas were well ahead of their time.

The book is hardly spellbinding, and it is somewhatdistracting that the author refers to his own work in thethird person, rather than coming out with a straightfor-ward ‘I’ or ‘my’. One also has the feeling that thetranslators could have done a better job of giving thetext a smooth flow. Thus, in rough passages, I tended to putthe book aside and move on to reading more excitingCorresponding author: Bill Martin ([email protected]).

Update TRENDS in Ecology and Evolution Vol.19 No.5 May 2004230

www.sciencedirect.com