anthropological genetic analysis of human disease...

ANTHROPOLOGICAL GENETIC ANALYSIS OF HUMAN DISEASE FROM EVOLUTIONARY, POPULATION, AND CLINICAL PERSPECTIVES

REBECCA R. GRAY

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

ACKNOWLEDGMENTS

Foremost, I thank my Ph.D. advisor Dr. Connie Mulligan for her mentorship and affording

me the opportunity to conduct this research. I thank the members of my Ph.D. committee;

specifically I thank Dr. Maureen Goodenow for her training and expertise; I thank Dr. John

Krigbaum and Dr. Marta Wayne for their early guidance; and I thank Dr. David Reed for his

helpful analytical advice. I thank Dr. Marco Salemi for his instruction and involving me in

additional projects. I thank my collaborators on the Treponema project at the University of

Washington, including Dr. Sheila Lukehart and Dr. Arturo Centurion. I thank my collaborators at

the National Institutes of Health including Dr. Jordi Clarimon, Dr. Andrew Singleton, Dr. David

Goldstein, and Dr. Mary-Anne Enoch. I thank Dr. Grace Aldrovandi at the Children’s Hospital in

Los Angeles who collaborated on the breastmilk project. I thank my undergraduate advisor Dr.

Carole Counihan for my initial and continued enthusiasm for anthropological research. I

appreciate the suggestions and feedback on these projects from the members of Drs. Mulligan

and Goodenow’s lab. I acknowledge the undergraduate students who assisted me on these

projects, including Danielle Muchnick and Lindsay Williams. I profess deep gratitude to the

Native American individuals and the Zambian women who participated in the studies which are

part of this dissertation. Finally, I thank my parents for the intellectual foundation they provided,

and my husband for his encouragement and profound patience.

TABLE OF CONTENTS page

ACKNOWLEDGMENTS ...............................................................................................................3

LIST OF TABLES...........................................................................................................................7

LIST OF FIGURES .........................................................................................................................8

ABSTRACT...................................................................................................................................10

CHAPTER

1 INTRODUCTION ..................................................................................................................12

2 MOLECULAR EVOLUTION OF THE TPRC, D, I, K, G, AND J GENES IN THE PATHOGENIC GENUS Treponema .....................................................................................25

Introduction.............................................................................................................................25 Materials and Methods ...........................................................................................................29

Treponemal Strains and tpr Sequencing..........................................................................29 Evolutionary Analysis of Sequences ...............................................................................30 Phylogenetic Analyses.....................................................................................................30 Detection of Recombination............................................................................................31

Results.....................................................................................................................................32 Phylogenetic Analyses.....................................................................................................32

Phylogenetic analyses of subfamily I.......................................................................33 Phylogenetic analyses of subfamily II......................................................................36 Phylogenetic analyses of subfamily III ....................................................................37

Statistical Tests for Recombination.................................................................................38 Analysis of Nucleotide Diversity and Composition........................................................40

Discussion...............................................................................................................................41

3 LINKAGE DISEQUILIBRIUM AND ASSOCIATION ANALYSIS OF ALPHA SYNUCLEIN (SNCA) AND ALCOHOL AND DRUG DEPENDENCE IN TWO AMERICAN INDIAN POPULATIONS ...............................................................................59

Sampling Strategy ...........................................................................................................61 Testing Instruments, Interviews, and Psychiatric Diagnoses ..........................................62 Genotyping ......................................................................................................................63 Statistical Analysis ..........................................................................................................63

Results.....................................................................................................................................65 Discussion...............................................................................................................................68

4 LACK OF ASSOCIATION BETWEEN ADH/ALDH MARKERS AND SUBSTANCE USE DISORDER IN NATIVE AMERICAN POPULATION ..............................................75

Samples............................................................................................................................78 Testing Instruments, Interviews, and Psychiatric Diagnoses ..........................................78 Genotyping ......................................................................................................................79 Statistical Analysis ..........................................................................................................79

Results.....................................................................................................................................80 Discussion...............................................................................................................................82

5 DYNAMIC AND DISTINCT EVOLUTION OF HIV-1 IN BREASTMILK OVER TWO YEARS POST-PARTUM ............................................................................................89

Introduction.............................................................................................................................89 Background.............................................................................................................................91

Human Immunodeficiency Virus Type 1 Infection.........................................................91 Stages of Breastmilk Production .....................................................................................93 Cellular Composition of Breastmilk................................................................................95 Compartmentalization of Breastmilk Virus.....................................................................96 Risk of Transmission via Breast-feeding ........................................................................98 Our Study.......................................................................................................................102

Materials and Methods .........................................................................................................104 Subject ...........................................................................................................................104 Viral Isolation, Amplification, and Sequencing ............................................................104 Sequence Analysis and Recombination.........................................................................105 Phylogenetic Analyses...................................................................................................106

Branch Selection Analysis .....................................................................................108 Compartmentalization ............................................................................................108

Results...................................................................................................................................109 Subtype Analysis ...........................................................................................................109 Sequence Analysis.........................................................................................................109

Variable regions 1 and 2 sequence analysis ...........................................................109 Variable region 3 loop analysis ..............................................................................111

Recombination Analysis................................................................................................112 Phylogenetic Analyses...................................................................................................113

Bayesian tip-date phylogeny ..................................................................................113 Rooting the phylogeny ...........................................................................................116 Branch selection analysis .......................................................................................117 Inclusion of breastmilk month 1 sequences ...........................................................118

Migration Analysis ........................................................................................................119 Discussion.............................................................................................................................119

6 CONCLUSION.....................................................................................................................139

APPENDIX

A LIST OF QUESTIONS FOR SUBSTANCE ABUSE CATEGORIZATION .....................145

LIST OF REFERENCES.............................................................................................................147

BIOGRAPHICAL SKETCH .......................................................................................................172

LIST OF TABLES

page Table 2-1. Treponema isolates used in this study ..........................................................................50

Table 2-2. T. pallidum primers used in this study..........................................................................51

Table 2-3. Polymorphism at the tprC and tprD loci among pathogenic treponemes. ...................52

Table 2-4. Recombinant regions identified by RDP2....................................................................53

Table 2-6. Average GC content at combined 1st + 2nd (GC1+2) and 3rd codon (GC3) positions .............................................................................................................................54

Table 3-1. Demographic and phenotypic characterstics of southwest (SW) and plains populations.........................................................................................................................70

Table 4-1. Loci, primers, cycling conditions and restriction enzymes for 12 loci studied. ...........85

Table 4-2. Phenotypic characteristics of the dataset. ....................................................................86

Table 4-3. Haplotype frequencies and p-value for comparisons of cases vs. controls. ................86

Table 4-4. Chi-squared and regression p-values for genotype and allele associations for each marker........................................................................................................................87

Table 5-1. Number of sequences generated for each tissue.........................................................125

Table 5-2. Sequence characteristics of V1 and V2. .....................................................................125

Table 5-3. Combination of V1 and V2 haplotypes. .....................................................................126

Table 5-4. Hudson test for population structure. .........................................................................126

Table 5-5. Number of putative recombinant clones.....................................................................127

Table 5-6. Marginal likelihoods for models used in the Bayesian analysis.................................127

LIST OF FIGURES

Figure page Figure 1-1. Model for studying human diseases............................................................................24

Figure 2-1. Unrooted ML phylogenies of multiple tpr genes........................................................55

Figure 2-2. ML phylogenies for tprD, C, and I. ............................................................................56

Figure 2-3. ML phylogeny of tprG and J.......................................................................................57

Figure 2-4. ML phylogeny of tprK. ...............................................................................................58

Figure 3-1. Relative positions of single nucleotide polymorphisms assessed in α-synuclein (SNCA) gene.......................................................................................................................72

Figure 3-2. Single-marker analyses representing p values for each marker on a logarithmic scale....................................................................................................................................73

Figure 3-3. Allelic distribution of the NACP-REP1 microsatellite repeats...................................74

Figure 4-1. Linkage disequilbrium of markers assessed in the ADH gene family. .......................88

Figure 5-1. Sampling times and tissues. ......................................................................................128

Figure 5-2. Neighbor-joining phylogeny of all subtypes in group M plus this patient. ..............128

Figure 5-3. Haplotype analysis of V1. .........................................................................................129

Figure 5-4. Haplotype analysis of V3. .........................................................................................130

Figure 5-5. Recombination alignments........................................................................................131

Figure 5-6. Network of breast milk sequences from week 1. ......................................................131

Figure 5-7. Bayesian consensus phylogeny for C2V5 (A). .........................................................132

Figure 5-8. Bayesian consensus phylogeny for C2V5 (B). .........................................................133

Figure 5-9. Best-rooted maximum likelihood phylogeny............................................................134

Figure 5-10. Bayesian consensus phylogeny for the C2V5 (A) dataset with branches under significant selection. ........................................................................................................135

Figure 5-11. Bayesian consensus phylogeny with breast milk month 1 sequences.....................136

Figure. 5-12. Migration analysis for two tissues..........................................................................137

Figure 5-13. Migration analysis for three tissues.........................................................................138

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

ANTHROPOLOGICAL GENETIC ANALYSIS OF HUMAN DISEASE FROM EVOLUTIONARY, POPULATION, AND CLINICAL PERSPECTIVES

Rebecca R. Gray

May 2008

Chair: Connie Mulligan Major: Anthropology

In this dissertation, I used genetic data from both humans and pathogens to explore the

evolution and etiology of three diseases from temporally distinct perspectives. I employed an

evolutionary framework to address the the origin of syphilis, a population perspective to

determine genetic components contributing to alcoholism in Native Americans, and a clinical

perspective to study factors relating to the transmission of HIV-1 via breastfeeding. In the first

study, I used sequence data from six genes from three subspecies of Treponema pallidum, the

spirochetes that cause venereal syphilis, yaws, and endemic syphilis in humans, as well as two

other Treponema species, to determine their evolutionary origin and relationships using

phylogenetic and population genetic analyses. My data discriminate between key components of

several of the leading theories of treponemal evolution, and provide new loci that are distinct

among the treponemes and can be used for diagnosis. Second, I genotyped ~1000 Native

American individuals for markers in the alpha-synuclein gene (SNCA) and used sequence data

from ~400 Native Americans from the alcohol dehydrogenase gene (ADH) and the aldehyde

dehydrogenase gene (ALDH) to test for an association with substance abuse in these populations.

I used both dichotomous and continuous measures of addition and several statistical tests to

determine association. Despite the high power of the study, no significant association was

detected. This may be the result of the past evolutionary history of Native Americans, who

experienced a severe genetic bottleneck during the migration from Asia and may have lost

variants that have been previously associated with substance abuse. I concluded that a focus on

environmental causes and solutions may be most appropriate in these populations. Finally, I

sequenced and analyzed the env gene of the human immunodeficiency virus type-1 (HIV-1) in

breast milk and blood plasma from an HIV-1 positive woman who transmitted the virus to her

infant via breastfeeding. This was the first longitudinal study of HIV-1 in breast milk, and major

findings included the distinctiveness of the virus in milk during the first month post-partum, the

compartmentalization of the virus over time, and the dynamic evolutionary pattern of the virus in

the milk. These results provided information about the biological mechanism responsible for

differential transmission risks associated with various modes of breastfeeding.

Genetic anthropologists are equipped with the analytical tools to study the biological

mechanisms of diseases and to incorporate information about the relevant underlying population

structure and evolutionary history. This unique perspective allows genetic anthropologists to

provide comprehensive clinical and policy recommendations based on genetic data. Finally, the

multi-disciplinary approach employed by anthropologists can be valuable in ensuring that

resulting applications of the data are culturally appropriate and provide maximum health benefits

to communities in need.

CHAPTER 1

INTRODUCTION

Disease has been a major component of the human experience for the past 10,000 years,

and likely long before that time as well (Cockburn 1971; Omran 1971; Armelagos and Dewey

1975; Cohen and Armelagos 1984; Barrett et al. 1998). Human health and diseases are studied in

a wide range of fields, including anthropology, biology and medicine. Anthropologists in

particular pride themselves on the holistic nature of their discipline, which not only incorporates

incredibly diverse research, but also encourages interdisciplinary communicastion and

interpretation. Dialogue between subfields within anthropology and across disciplines such as

medicine and public health allows for a more comprehensive and cross-cultural perspective on

the nature of disease. Anthropological genetics is a subfield of biological anthropology and

applies which uses genetic data and evolutionary concepts to address anthropological questions,

including the nature of human and non-human primate relationships (Krings et al. 1997; Krings

et al. 1999; Krings et al. 2000; Relethford 2001; Lalueza-Fox et al. 2005; Caramelli et al. 2006;

Plagnol and Wall 2006; Krause et al. 2007), the routes and timing of human migrations around

the world (i.e. Cann, Stoneking, and Wilson 1987; Kolman, Sambuughin, and Bermingham

1996; Quintana-Murci et al. 1999; Macaulay et al. 2005; Ramachandran et al. 2005), and the

demographic forces that have shaped human history (i.e. Harpending et al. 1993; Harpending

1994; Sherry et al. 1994; Harpending et al. 1998; Harpending and Rogers 2000). The impact of

disease on human genetic diversity is often addressed as well, in part because alleles affecting

and affected by diseases are often population specific and provide information about the

questions posed above. In addition, many anthropologists are interested in determining the

relative contribution of genetics to the etiology and severity of human disease.

In this dissertation, I use genetic data from both humans and pathogens to explore the

evolution and etiology of one complex disease and two infectious diseases from temporally

distinct perspectives. I have adapted a model that incorporates three major perspectives

(evolutionary, population, and clinical) from which the anthropological study of the genetics of

human disease can be approached (Figure 1-1). Although the original model was applied to

infectious disease (Quintana-Murci et al. 2007), I have broadened the model to include complex

disease as well. This modification provides a temporal framework for considering genetic

diversity and disease, which enables a more holistic treatment of the evolutionary, demographic,

and cultural forces that are operating on, and in concert with, genetic variability. I used an

evolutionary perspective to address the origin of syphilis, a population perspective to determine

genetic components contributing to alcoholism in Native Americans, and a clinical perspective to

study factors relating to the transmission of HIV via breastfeeding.

The evolutionary perspective typically incorporates the greatest genetic variation, because

the questions addressed in this framework are often rooted in the distant past. For example,

constant exposure to pathogens has shaped the human genome (Nielsen et al. 2007), either

through negative selection (Schwartz et al. 1995; Diaz et al. 2000; Hugot et al. 2001) or

balancing selection (Schroeder, Gaughan, and Swift 1995; Allen et al. 1997; Verrelli et al. 2002).

An intriguing example is the chemokine receptor CCR5 locus, at which homozygosity for the

delta32 mutation confers resistance to the human immunodeficiency virus type 1 (HIV-1)

infection (Samson et al. 1996). Under the assumption of neutrality, the high frequency (up to

10%) of the variant in European populations would suggest that the allele arose over 100ka

(Stephens et al. 1998b; Galvani and Novembre 2005). However, using linkage disequilibrium

and geographic structure analyses, the origin of the allele has been estimated ca.1,000 years ago,

which suggests that strong selection has driven the allele to its current high frequency (Libert et

al. 1998; Stephens et al. 1998b; Lucotte 2001). Clearly HIV-1, which only entered the human

population less than one hundred years ago (Sharp et al. 2001), could not have been the selective

force. However, smallpox, which is phenotypically similar to HIV-1 and has caused high rates of

mortality episodically over the past millennium, may have been the selective force (Galvani and

Slatkin 2003; Galvani and Novembre 2005). For comparison, the nucleotide diversity at the

CCR5 gene was compared between humans and chimpanzees, which are subject to a simian

immunodeficiency virus (SIVcpz) similar to HIV-1 but much less pathogenic. An excess of rare

variants was found in the chimpanzee gene, suggesting that the locus was influenced by a

selective sweep (Wooding et al. 2005). CCR5 was much more diverse in humans and

characterized by an excess of common variants, suggesting balancing selection (Wooding et al.

2005). These studies of the CCR5 gene demonstrate the global nature of the evolutionary

perspective, which considers comprehensive human genetic variation and the impact on human

and nonhuman primate genomes from past pathogen experiences. In chapter two, I used genetic

information from pathogens themselves to address evolutionary questions of anthropological

interest, such as where and when in human history veneareal syphilis evolved. This project

elucidates an important evolutionary mechanism of emerging pathogens, gene conversion, which

may have a significant impact on our approach to treatment and vaccination.

The second stage of the model is the population perspective, which considers the genetic,

demographic and cultural influences that shape the distribution of diseases between and within

geographic and ethnic groups. Disease may differentially affect populations either because

particular disease-causing variants may exist at higher frequencies in populations due to

demographic history, or because environmental forces within a population may exacerbate the

effect of variants predisposing a disease (Schork 1997). One well studied example is the extreme

difference in diabetes prevalence between different ethnic groups (Fujimoto 1996) which can

range from over 50% in the Native American Pima (Knowler et al. 1990) and in Pacific Islanders

(Amos, McCarty, and Zimmet 1997; McCarthy and Zimmet 2001) to a fraction of that

prevalence in European populations with the same risk factors (West 1974; Young et al. 2000).

The “thrifty-gene” hypothesis proposed that repeated exposure to famine in certain hunter-

gatherers led to the selection of genes which promote storage of fat; however, in modern times,

an over-abundance of food has led to the high prevalence of diabetes and metabolic disorders

(Neel 1962; Neel 1999), which is especially exacerbated in non-Western populations that have

possibly had less time to adapt to changing conditions (Neel 1982). Although this hypothesis

helped to explain Native American rates of diabetes (Johnson and McNutt 1964; Doeblin, Evans,

and Ingall 1969; Wise 1976) and continues to be discussed (Benyshek and Watson 2006;

Paradies, Montoya, and Fullerton 2007), numerous objections have been raised, including the

ethnographic validity of the premise that hunter-gatherers experience more famines than

agriculturalists (Dirks 1993; Benyshek and Watson 2006), the importance of the fetal

environmental component (Hales and Barker 1992; Barker et al. 1993; Hales and Barker 2001;

Lindsay and Bennett 2001; Ordovas, Pittas, and Greenberg 2003), and the reductionist approach

to human variability encompassed by a typological (race-based) approach (Fee 2006; Paradies,

Montoya, and Fullerton 2007), among other points. However, the longevity of the hypothesis

demonstrates both the attraction of an anthropological theory that incorporates evolution and

culture, as well as the complications inherent in the etiology of complex diseases. In chapters

three and four, I investigated the potential association between genetic data markers and

substance abuse in Native American population using the population perspective. The past

genetic history of this population may have contributed to the non-significance of the genetic

data, and I ultimately concluded that a focus on environmental causes and solutions might be

most appropriate in these populations.

Finally, the clinical perspective focuses on individuals involved in experimental and

intervention studies usually run by medical practitioners. This perspective typically measures

either the response to an intervention or the frequency with which healthy individuals succumb

to a particular disease over time. Studies from the clinical perspective often do not explicitly

account for genetic and cultural diversity, which risks misinterpretation of results due to the

underlying population genetic stratification and/or cultural influences that may impact the

outcome of such trials. For example, the majority of drug trial studies for HIV-1 have been

conducted in the developed world (Perrin, Kaiser, and Yerly 2003). Host genetics, such as the

human leukocyte antigen genes (HLA) are certainly involved in HIV-1 infection (Moore et al.

2002; O'Brien and Nelson 2004; Fellay et al. 2007; Brass et al. 2008), which are differentially

distributed among geographic groups (Cavalli-Sforza, Menozzi, and Piazza 1994; Monsalve,

Helgason, and Devine 1999; Blanco-Gelaz et al. 2001; Cao et al. 2004; Prugnolle et al. 2005). If

aspects of host genetics affect the efficacy of drug therapy or vaccines, then only incorporating a

subset of the total human genetic variation in these trials can lead researchers to misinterpret the

value of their discoveries, since treatments may not have the same efficacy in every human

group. Furthermore, the need for such drugs is much greater in developing countries than in the

West, and ignoring the particular genetic and cultural aspects of these populations hinders the

development of effective treatments. Another potential concern with clinical studies is the

generalized use of race, which is often used as a quick proxy by the medical community to

represent perceived differences in genetic ancestry and cultural lifestyle, when in fact the factors

underlying a person’s genetic ancestry and choices are much more complex (Duster 2007;

Hoover 2007). For example, in a controversial decision by the FDA, approval was granted for

the drug to be marketed towards African-Americans (Carmody and Anderson 2007; Yancy et al.

2007). Some have argued that the identification of the efficacy in African-Americans but not

Causasians was prospective and questionable (Bibbins-Domingo and Fernandez 2007; Duster

2007), although others suggest that acknowledging the interplay between human genomic

variation and pharmacogenomics may improve drug development and global health care (Seguin

et al. 2008). Lastly, the ethics of clinical studies can be questionable when indigenous

populations are used as study subjects who may not have the expertise to fully give their

voluntary informed consent, and who may receive no benefit from their participation. The

expertise of anthropologists is sorely needed in the clinical realm to advise, plan and interpret

studies and data that make use of clinical trials so that maximum benefit for the eventual

recipients of the intervention can be achieved. In chapter five, I use a clinical perspective to

investigate potential molecular mechanisms involved in transmission of HIV-1 via breastfeeding.

I believe that current recommendations about breastfeeding by HIV-1 positive women in the

developing world should both account for the difficulties inherent in the practice and its

cessation for women and the infants, as well as ensure that all aspects of the guidelines are

scientifically sound.

In this dissertation, I chose to study three diseases affecting humans corresponding to the

three perspectives outlined above. I used genetic variation from both the pathogen itself and from

humans to address anthropological, evolutionary, public health, and medical questions. I used an

evolutionary framework to address the the origin of syphilis, a population perspective to

determine genetic components contributing to alcoholism in Native Americans, and a clinical

perspective to study factors relating to the transmission of HIV-1 via breastfeeding. Thus, my

results have broad relevance not only to a range of anthropological questions, such as the origin

of venereal syphilis, but can also be translated into clinical significance and inform health

policies.

In chapter two, I examined the evolution of three human treponemes: Treponema pallidum

subsp. pallidum, which is the etiological agent of venereal syphilis, T.p. subsp. pertenue, which

causes yaws, and T.p. subsp. endemicum, which causes endemic syphilis. Previous knowledge of

these diseases has come primarily from archaeological and historical evidence; however it is

difficult to discern the three diseases in the archeological record because the bone pathologies

caused by the three diseases are similar, and a major diagnostic criterion is therefore the

frequency and distribution of treponemal pathology among skeletons at burial sites and the

anatomical distribution of lesions (reviewed in (Powell and Cook 2005). Even the diagnosis of

contemporary samples is difficult because the clinical manifestations are similar and there is a

dearth of distinct molecular markers defining the three diseases (Centurion-Lara et al. 2006).

Several prominent hypotheses have been advanced describing the evolution of the treponemes

(Baker and Armelagos 1988; Powell and Cook 2005). Rothschild (2003) proposed that yaws (T.

p. subsp. pertenue) was the most ancestral of the three T. pallidum subspecies and was present at

least as far back as the origin of modern humans in Africa, and the other two subspecies each

derived from yaws, with T. p. subsp. pallidum evolving in the New World no more than ~2000

years ago (Rothschild 2003). A New World origin of T. p. subsp. pallidum is central to the

original Columbian hypothesis that suggested venereal syphilis was brought to Europe by

Columbus’ crews returning from the New World (Crosby 1969). An alternative Columbian

hypothesis was advanced by Baker and Armelagos (1988) that suggested venereal syphilis

evolved very rapidly during Columbus’ return voyage from the New World from a non-venereal

treponeme and was subsequently introduced to Europe (molecular data supporting this view were

published recently, (Harper et al. 2008) as well as a critical review (Mulligan, Norris, and

Lukehart 2008) and both attracted astonishingly widespread interest among the general public).

In contrast, the Pre-Columbian hypothesis suggests that treponemal diseases, including venereal

syphilis, existed in the Old World prior to Columbus’ voyages but were diagnosed incorrectly.

For example, pinta was the original form present throughout the world during the Pleistocene,

followed by the evolution of yaws (12,000 years ago), then endemic syphilis (9,000 years ago)

and, finally, venereal syphilis (5,000 years ago) (Hackett 1963). Lastly, the Unitarian hypothesis

suggests that venereal syphilis, endemic syphilis, yaws, and pinta are not in fact distinct diseases,

but rather are environmentally determined manifestations of the same disease (Hudson 1965).

My goal was to use molecular genetic data sampled from contemporary strains of each of the

three main treponemes (no molecular data exist for T. carateum that causes pinta), as well as two

outgroup species, to determine the support for any of these hypotheses (Chapter 2, Gray et al.

2006). This was the first phylogenetic study of the treponemes, and it provided valuable

information on the possible evolutionary scenarios of these pathogens. Furthermore, I was able

to establish particular alleles that are specific to each of the three subspecies that could be used in

future clinical investigations to aid in diagnosis. Finally, I provided new data that suggests

treponemal genome evolution has been driven by recombination, specifically gene conversion,

much more often than was previously known or predicted.

In the second study (chapters three and four), I used a population perspective to study

alcoholism in Native Americans. This group experiences alcohol related deaths at more than five

times the rate of the general United States population (IHS 2006) and are twice as likely to die of

chronic liver disease than Caucasians (CDC 2006a). The possible cultural reasons for this

disparity include high rates of poverty and unemployment, lack of access to health care, and

overall poor health (IHS 2006). However, high rates of alcoholism in Native Americans are

surprising in light of the research that shows ancestral Asian populations have a low level of

alcoholism, most likely mediated by a very high frequency of two alleles at the two main genes

involved in alcohol metabolism (alcohol dehydrogenase gene [ADH] and aldehyde

dehydrogenase [ALDH]). These alleles slow the body’s metabolism of alcohol resulting in toxic

accumulation of acetaldehyde that produces an intensely uncomfortable sensation, i.e. flushing

response, that ultimately protects against alcoholism through the behavioral response of

consuming less alcohol (Chao et al. 1994; Thomasson et al. 1994; Chen et al. 1996; Nakamura et

al. 1996; Tanaka et al. 1996; Shen et al. 1997; Osier et al. 1999). Because Native Americans are

genetically descended from a north-central Asian source population within the last 20,000 years

(Meltzer 1993; Merriwether, Rothhammer, and Ferrell 1995; Kolman, Sambuughin, and

Bermingham 1996), it might be expected that they would have inherited these protective genes.

However, these protective alleles were found to be absent in a Southwest population, although a

significant association was found between other alleles at the ADH locus and the behavior of

binge drinking (Mulligan et al. 2003). In addition, a genome–wide association study performed

with the same Native American population found a strong association signal with alcoholism on

chromosome four near the ADH gene (Long et al. 1998). In order to further investigate the

possible genetic basis of alcoholism in Native Americans, I examined 12 single nucleotide

polymorphisms (SNPs) at both the ADH and ALDH genes of ~400 individuals from a Plains

population for association with multiple dichotomous and continue measures of alcohol and drug

abuse (Chapter 4). Despite the numerous phenotypes and the extensive genetic dataset, no

significant associations were detected. I also analyzed genotype data from the alpha-synuclein

(SNCA) gene, also located on chromosome four near ADH and therefore another attractive

candidate gene for alcoholism (Chapter 3, Clarimon et al. 2007). α-synuclein is involved in

dopaminergic neurotransmission, and the overexpression of the protein has been implicated in

the etiology of Parkinson’s disease (Polymeropoulos et al. 1997; Kruger et al. 1998) and

Alzheimer’s disease (Ueda et al. 1993), possibly because of neurodegeneration of dopamine

neurons due to toxic build-up of the protein (Mash et al. 2003). More recently, α-synuclein has

also been associated with alcoholism (Liang et al. 2003; Bonsch et al. 2005a; Bonsch et al.

2005b; Bonsch et al. 2005c) and drug addiction (Mash et al. 2003; Kobayashi et al. 2004).

Specifically, increased mRNA and protein are elevated in alcohol-preferring individuals in

humans, rats, and macaque monkeys (Liang et al. 2003; Spence et al. 2005; Walker and Grant

2006) and are associated with alcohol craving in humans (Bonsch et al. 2005a; Bonsch et al.

2005c). I genotyped and analyzed 15 SNPs at the SNCA locus in ~1000 individuals from a Plains

and a Southwest population and again found no significant association between any SNP and

alcohol or drug abuse or dependence. Since genetic variability and promoter polymorphisms

upstream of SNCA may mediate the increase in mRNA and protein expression (Bonsch et al.

2005b) my results suggest that study of upstream polymorphisms may represent a productive

avenue for future research. However, the results of these two studies suggest that the

environment may be a more influential component in substance abuse among Native Americans,

and therefore further resources should be devoted to address the underlying economic and social

problems in these populations.

In the final study (chapter five), I used molecular data to investigate recent observations

that, contrary to previous wide-held opinion, breastfeeding by HIV-1 positive women in

resource-poor areas is more beneficial to the long-term health of their children than complete or

partial replacement feeding (feeding of any substance other than breastmilk). This study used a

clinical perspective, as I investigated the evolution of HIV-1 in the breastmilk and blood over

time from a woman who participated in a clinical trial on breastfeeding-mediated transmission of

HIV. This study addressed many anthropological issues. Worldwide, an estimated 420,000

children were infected with HIV-1 in 2007, the vast majority through mother-to-child-

transmission (MTCT) (WHO 2007). Breast-feeding accounts for one-third to one-half of all

MTCT events during the first 24 months of life (Dabis et al. 1999; Iliff et al. 2005). In the US,

women are counseled by the CDC to replace breastfeeding with formula if infected with HIV-1

(CDC 2007), and the World Health Organization (WHO) previously recommended that HIV-1

positive women in all countries avoid all breastfeeding (WHO 2003). However, formula-feeding

is impractical for women in resource poor regions of the world where they do not have consistent

access to clean water, formula, and health care, and breast feeding may be the only practical

option. Cultural pressures also make women reluctant to eschew breast feeding as this can be

seen as a tacit admission of HIV-1 status. However, recent observational studies have suggested

that exclusive breast feeding, as opposed to the simultaneous feeding of milk and other foods,

may significantly reduce the risk of transmission of HIV-1 (Coutsoudis 2000; Coutsoudis et al.

2001; Coutsoudis et al. 2002; Iliff et al. 2005; Kuhn et al. 2007). The WHO subsequently

changed its recommendations to women in developing countries to encourage exclusive

breastfeeding up to six months followed by abrupt weaning (WHO 2006). However, the

biological mechanisms underlying the reduction of risk through exclusive breastfeeding have not

been clearly elucidated. Also, the benefits of abruptly weaning at six months are not at all clear,

while the practice is difficult and painful for the mother who would typically wean over a period

of months. I amplified and sequenced the env gene from viral populations present in the breast

milk and plasma over a two-year period from an HIV-1 positive woman participating in a

clinical trial in Zambia. I used phylogenetic and sequence-based analyses to examine the

evolution of the virus over time and within tissues. I concluded that the breastmilk virus was

genotypically distinct from the plasma virus during the early stages of breastfeeding, and the

virus in both tissues was subject to changing evolutionary dynamics and selective pressures over

time. The benefit of an anthropological genetic perspective that I bring to this study is the ability

to use evolutionary analyses to investigate the molecular basis of modulated risk of MTCT, with

the goal of advocating a scientifically sound and culturally sensitive breastfeeding management

plan to women while eliminating unnecessarily onerous measures.

In sum, this dissertation demonstrates how genetic anthropology can be used to address

both anthropological and clinical concerns from three temporally and philosophically distinct

perspectives. My studies incorporate pathogen genetics in addition to human genetics, which can

broaden our evolutionary understanding of the interaction between humans and pathogens. My

dissertation demonstrates the value of using an anthropological perspective in arenas often

dominated by medical practitioners. My unique advantage as a genetic anthropologist is that I

can apply analytical tools of evolutionary genetics to study the biological mechanisms of

diseases, while maintaining a multi-discinplinary approach that considers the cultural, historical,

and demographic factors that influence etiology. In addition, my training as an anthropologist

allows me to interpret the clinical results from these studies in a culturally appropriate context

for the maximum benefit of the participants.

Figure 1-1. Model for studying human diseases.

CHAPTER 2

MOLECULAR EVOLUTION OF THE TPRC, D, I, K, G, AND J GENES IN THE PATHOGENIC GENUS Treponema1

Introduction

The evolution of bacterial genomes has been heavily influenced by processes such as

horizontal gene transfer and homologous recombination, both of which can accelerate adaptation

through the generation of new alleles (Feavers et al. 1992; Baldo et al. 2006). Horizontal (or

lateral) gene transfer occurs through the uptake of genetic material from another genome, i.e. an

inter-genomic event, and includes transformation, conjugation, and transduction (Ochman,

Lawrence, and Groisman 2000). Homologous recombination, which is typically an intra-

genomic event, also occurs with high frequency in bacterial genomes (Smith, Dowson, and

Spratt 1991; Feil et al. 2001; Feil and Spratt 2001). Several outcomes may arise from a

recombination event, including translocations, deletions, duplications, inversions, and gene

conversions (Hughes 2000). Gene conversions are intra-genomic events that are the result of a

non-reciprocal transfer of genetic information from a donor locus to a recipient locus, either

through the permanent transfer of genetic material to the recipient locus or through the temporary

use of the donor sequence as a template for DNA synthesis on the recipient strand (Santoyo and

Romero 2005).

Gene conversion is especially important in the evolution of gene families (Slightom,

Blechl, and Smithies 1980; Drouin et al. 1999; Lathe and Bork 2001; Noonan et al. 2004). Gene

families are comprised of paralogous genes, which are defined as two or more genes within the

same genome that are so similar in DNA sequence they are assumed to have originated from one

1 Gray, R. R., C. J. Mulligan, B. J. Molini, E. S. Sun, L. Giacani, C. Godornes, A. Kitchen, S. A. Lukehart, and A. Centurion-Lara. 2006. Molecular evolution of the tprC, D, I, K, G, and J genes in the pathogenic genus Treponema. Mol Biol Evol 23:2220-2233.

ancestral gene (King and Stansfield 1997). The initial event creating the gene family was thus

likely to be one or more duplication events. The high sequence homology between paralogous

genes that signals a past duplication event also sets the stage for potential future homologous

recombination events (Schimenti 1994; Posada, Crandall, and Holmes 2002). Orthologous genes,

on the other hand, share sequence homology and are assumed to be descendant from a common

ancestral gene, but are present in different species (King and Stansfield 1997; Gogarten and

Olendzenski 1999). In this case, the genes most likely evolved through speciation rather than

duplication. Recombination can significantly impact inferred phylogenetic relationships (Feil et

al. 1999; Holmes, Urwin, and Maiden 1999; Feil and Spratt 2001; Worobey 2001). In the case of

gene families, gene conversion can cause paralogous genes to cluster more closely than

orthologous genes, thus confusing the order of evolution of the organisms (Drouin et al. 1999).

There are two seemingly opposite outcomes of gene conversion, concerted evolution and

increased sequence diversity, which may be distinctive of different stages of multi-gene

evolution (Santoyo and Romero 2005). After a gene family has been generated by ancient

duplication events, paralogous and orthologous comparisons should exhibit the same degree of

divergence. If the paralogous comparisons are more similar, then the genes in a multi-gene

family are evolving in a non-independent manner leading to homogenization of the genes, or

concerted evolution (Ohta 1992; Howell-Adams and Seifert 2000; Liao 2000; Lathe and Bork

2001). This may be beneficial in the case where a weakly advantageous point mutation arises in

one gene, and its effect is multiplied when the entire gene sequence is converted to other loci

(Dover 2002). This is consistent with the proposal that purifying selection may operate on genes

that have undergone duplication on the assumption that a duplicated gene must have an initial

benefit for the organism and, thus, its sequence must be conserved (Lynch and Conery 2000;

Kondrashov et al. 2002). As the sequences accumulate neutral diversity, though, the process of

gene conversion becomes less efficient. After time, only small “islands” of homology exist and a

site-specific system of shorter regions of gene conversion may take over, the outcome of which

is increased sequence variation (Zhang et al. 1992; Zhang et al. 1997; Zhang and Norris 1998;

Santoyo and Romero 2005; Taguchi et al. 2005). This is consistent with Ohno (1970), who

suggested that duplicated genes are under less selective pressure and may accumulate more

mutations leading to loss of the paralog or creation of a new function (Kimura and King 1979;

Walsh 1995; Wagner 1998; Lynch and Force 2000). Thus, concerted evolution and increased

sequence diversity may indicate earlier and later stages, respectively, in the evolution of gene

families (Santoyo and Romero 2005).

In this study, we examine genes in the tpr (Treponema pallidum repeat) gene family in

members of the genus Treponema (Spirochete family of bacteria) to investigate the evolution of

the gene family and, possibly, evolution of the treponemes themselves. The tpr gene family

consists of 12 paralogous genes that comprise 2% of the T. pallidum genome and have probably

evolved through gene duplication and gene conversion. These genes are related to the major

outer sheath protein (Msp) in Treponema denticola (TDE0405); however it appears that T.

denticola did not experience a history of gene duplication and gene conversion at this locus since

T. denticola possesses only one tpr-like gene (Seshadri et al. 2004). The tpr gene family in T.

pallidum is believed to encode potential virulence factors and is divided into three families:

Subfamily I (tprC, D, I, and F), Subfamily II (tprE, G, and J), and Subfamily III (tprA, B, H, K,

and L). The gene products from Subfamilies I and II have conserved amino and carboxyl

terminal sequences with unique central regions, while Subfamily III has scattered conserved and

unique or variable regions (Centurion-Lara et al. 1999). Gene conversion has previously been

reported in tprK (Centurion-Lara et al. 2000a; Centurion-Lara et al. 2004). Seven variable

regions within tprK were proposed to have been created by gene conversion using sequences

from the flanking regions of tprD as donors (Centurion-Lara et al. 2004). The degree of diversity

in these variable regions appears to increase in the presence of adaptive immune pressure,

suggesting that a function of these gene conversions may be to create antigenic diversity

(Centurion-Lara et al. 2004).

The pathogenic treponemes include three Treponema pallidum subspecies, T. carateum, T.

paraluiscuniculi (rabbit syphilis), and the unclassified Fribourg-Blanc (simian) isolate. The three

T. pallidum subspecies include pallidum, which is the causative agent of human venereal syphilis

and pertenue and endemicum, which cause yaws and bejel, respectively. T. carateum is the

etiological agent of pinta, although no isolates of this organism are known to exist. None of the

pathogenic treponemes mentioned above can be propagated in vitro. The complete T. p. subsp.

pallidum genome (from the Nichols strain) was sequenced in 1998 and is considered the

reference strain (Fraser et al. 1998). T. denticola, considered a non-pathogenic treponeme,

probably had an ancient divergence with T. pallidum based on the large difference in GC content

between T. pallidum and T. denticola (52.8% and 37.9%, respectively) and in genome length

(1.14 Mb and 2.84 Mb, respectively) (Seshadri et al. 2004) and thus the T. denticola sequence

was not considered in this study. Although lateral gene transfer has been identified as a probable

evolutionary force in the genome of T. denticola, no evidence exists for lateral gene transfer in T.

pallidum (Seshadri et al. 2004).

In this project, we examined eight strains of T. pallidum subsp. pallidum, and two strains

each of T. pallidum subsp. pertenue and T. pallidum subsp. endemicum, representing all known

propagated human strains (two additional T. pallidum subsp. pertenue strains have recently been

obtained and are under study) as well as two non-human strains, T. paraluiscuniculi and the

simian isolate. Six tpr genes, representing all three subfamilies, were sequenced: tprC, D, G, J, I,

and K. In order to investigate the evolution of these tpr genes, we utilized phylogenetic methods,

general measures of nucleotide diversity, and specific methods to detect recombination events.

Materials and Methods

Treponemal Strains and tpr Sequencing

All treponemal isolates used in this study were propagated in New Zealand White rabbits

(Lukehart et al. 1980) with the approval of the University of Washington Institutional Animal

Care and Use Committee. The Fribourg-Blanc strain was isolated from the popliteal lymph node

of a baboon from a yaws-endemic area (Fribourg-Blanc, Mollaret, and Niel 1966); a single report

describes an experimental infection of humans with this strain (Smith et al. 1971). Strain

designations and origins of the isolates are indicated in Table 2-1. Organisms were extracted by

mincing infected testicular tissue in 0.9% saline and were quantitated by darkfield microscopy.

Treponemal suspensions were mixed with an equal volume of 2x DNA lysis buffer (20mM Tris,

pH 8; 0.2 M EDTA, pH 8; 1.0% sodium dodecyl sulfate). DNA from treponemes was extracted

as previously described (LaFond et al. 2003).

Full-length open reading frames (ORFs) of 1791-2268 bp (Table 2-2) from each strain

were amplified, cloned, and sequenced as previously described (Giacani et al. 2004; Sun et al.

2004). The ORFs were amplified from T. pallidum strains by PCR using primers (Table 2-2)

located in the flanking regions of the genes, cloned into the TOPO II vector (Invitrogen,

Carlsbad, CA) and sequenced in both directions by the primer walking approach as previously

described (Centurion-Lara et al. 2000b); the amplicons at tprG and J from MexicoA were

obtained using primers internal to the start and stop codons and contained no flanking sequence.

A minimum of two clones were sequenced for each amplicon and ambiguities were resolved by

sequencing a third clone from an independent PCR, except for the Gauthier tprG, I, and J ORFs,

for which a single clone for each ORF was sequenced in both directions. For most sequences,

five clones were analyzed. The T. paraluiscuniculi sequences were described previously

(Giacani et al. 2004). GenBank accession numbers for the sequences are: tprC - NC_000919,

AY536645-6, AY550204, AY542157, AY590560, AY550206, AY542153-5, AY685236,

DQ886671-73; tprD - AF217537-41, AF187952, AY685237, AE000520, AY533515,

AY542156; tprI - AY533508-14, NC_000919, DQ886678-82; tprG/tprJ - NC_000919,

AF073527, AY685239-40, DQ886674-77; TprK - NC_000919, AY685248-50, DQ886683-700.

Evolutionary Analysis of Sequences

Six loci were considered in this analysis: tprC, D, G, I, J and K. Sequences were aligned

using ClustalX (Thompson et al. 1997) as well as manually using BioEdit to ensure proper amino

acid alignment (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). Frameshift mutations in T.

paraluiscuniculi (bp 439 in tprC and tprD, bp 653 in tprG1 and tprG2) and T. p. subsp. pallidum

Sea81-4 (bp 1860 at tprG ) were removed from the alignment, as this would have created a

misalignment of the amino acids for the rest of the sequence. Levels of nucleotide diversity

within and between human treponemal subspecies (π and Dxy, respectively) were calculated

using DNAsp v. 4.10.4 (Rozas et al. 2003). GC content, using all available tpr sequences from

human treponemes (see Table 2-1), was calculated using PAML (Yang 1997). An AMOVA

(Analysis of Molecular Variance) was performed for tprC, I, and K using Arlequin version 3.0

(Excoffier, Laval, and Schneider 2005).

Phylogenetic Analyses

Maximum likelihood (ML) methods were used to infer the phylogenetic relationships

among the tested loci. First, the most appropriate substitution model for each locus was

determined using MODELTEST 3.06 (Posada and Crandall 1998). The following models were

selected for each locus: tprC (without T. paraluiscuniculi) - HKY+I+Γ , which allows for

different base frequencies and a separate transition and transversion rate (HKY model;

(Hasegawa, Kishino, and Yano 1985) as well as a proportion of invariant sites (I) and a gamma

distribution of mutation rates (Γ); tprD - HKY+Γ; tprG/J – GTR+Γ, which is a general time

reversible model that allows six different mutation rate categories (GTR) as well as a gamma

distribution of mutation rates (Γ) (with Nichols J) and HKY+Γ (without Nichols J); tprI – HKY;

tprK - HKY+Γ; tprC, D and I – GTR + Γ. The HKY + Γ model was used for the phylogeny

including all twelve Nichols tpr genes to reduce computational time due to the complexity of the

dataset. A maximum likelihood phylogeny was inferred using PAUP* 4.0b10 (Swofford 2002)

and the indicated substitution model. Full heuristic searching with the simple addition of

sequences and tree-bisection-reconnection (TBR) branch-swapping algorithms were used to

traverse the tree-space. Bootstrap analysis (1000 maximum likelihood replicates) was performed

using PAUP* 4.0b10 to determine the relative support for internal nodes. Third positions were

excluded in a separate analysis in order to determine if these positions had been subject to

mutational saturation.

Detection of Recombination

The RDP2 package (Martin, Williamson, and Posada 2005) was used to detect

recombination. This program implements several non-parametric methods to identify

recombinant and parental sequences and to estimate breakpoint positions that identify the limits

of the recombinant DNA in the sequences (Martin, Williamson, and Posada 2005). We used four

methods implemented in the RDP2 program: the RDP method, which is a phylogenetic method

that uses discordant branching patterns to infer recombination; the Maximum Chi-squared

(MaxChi) method (Smith 1992; Posada and Crandall 2001), which uses a sliding-window

approach along pairwise comparisons to identify discrepancies; the Chimera method (Smith

1992; Posada and Crandall 2001), which is similar to MaxChi but uses triplets of sequences

instead of pairs; and GENECONV, which compares fragments of sequence pairs (Padidam,

Sawyer, and Fauquet 1999). Non-default settings that were used consisted of a window size of

100, linear sequences, maximum p-value of 0.01 or 0.001 and a Bonferroni correction. All events

were listed. For the RDP method, internal and external references sequences were used, the

window size was set to 10, and 0-100 sequence identity was used. For both the MaxChi and the

Chimera methods, the number of variable sites was set to 30 with 1000 permutations and a max

p-value of 0.05. For the GENECONV method, the program was set to scan sequence triplets. In

all cases, the same alignment files from the phylogenetic analyses were used for the

recombination analyses.

Results

We examined six genes of the tpr gene family (tprC, D, G, J, I and K) in three human

treponemal subspecies (T. pallidum subsp. pallidum, endemicum and pertenue) and in two non-

human treponemes (T. paraluiscuniculi and the simian isolate) (Table 2-1). We were interested

in the relationship of the genes and alleles to one another as well as evidence for recombination.

Because of the well documented evidence for gene conversion in gene families and because no

evidence exists for lateral gene transfer in T. pallidum, we were specifically interested in

identifying intra-genomic recombination events, i.e. gene conversion. In order to investigate the

evolution of these tpr genes, we utilized 1) phylogenetic methods, 2) specific methods to detect

recombination events, and 3) general measures of nucleotide diversity and composition.

In order to obtain an overall view of the genetic diversity at all of the studied loci, a

maximum likelihood (ML) tree was created using an alignment of 2708 nucleotides from all

twelve available tpr gene sequences for T. p. subsp. pallidum Nichols strain (obtained from

GenBank) (fig. 1a). Sequences from Subfamily I (tprC, D, I, F) and Subfamily II (tprE, J, G)

cluster in two separate clades that are each clearly separated from the rest of the phylogeny. In

contrast, Subfamily III (tprA, B, H, L, K) sequences do not cluster with each other or any other

sequences and are distributed with varying branch lengths between the Subfamily I and II clades.

These results are consistent with previous studies in which Subfamily III membership was less

clearly defined than the other subfamilies (Centurion-Lara et al. 2000b).

Phylogenetic analyses of subfamily I

In order to focus on Subfamily I diversity, a ML phylogeny was generated for all available

DNA sequences for all Subfamily I loci: tprC, D, and I (Figure 2-1). All of the tprI sequences

clade together, while the tprC and D sequences are interspersed with each other such that there

are no major monophyletic tprC or D clades. There are three instances in which paralogous

sequences cluster more closely than their orthologous counterparts, all of which involve tprC and

D: 1) The tprC and D sequences for four of the T. pallidum subsp. pallidum strains; 2) the tprC

and D sequences from pertenue Gauthier (along with SamoaD tprC); 3) the tprC and D

sequences for T. paraluiscuniculi. The eight pallidum sequences are identical, while the pertenue

Gauthier and T. paraluiscuniculi tprC and D sequences differ by a maximum of one and three

point mutations, respectively, highlighting the paralogous relationship of tprC and D in these

strains.

Individual ML trees were also created for each of the Subfamily I loci examined in this

study: tprC, D, and I. In the tprD phylogeny, two distinct clades are evident (Figure 2-2). One

clade is comprised of four identical T. p. subsp. pallidum sequences and T. paraluiscuniculi, all

of which carry the D2 allele (using terminology of Centurion-Lara et al. 2000b; sequences that

differ by a few base pairs but have the same defining motif are considered the same allele). The

second clade is comprised of the other four identical T. p. subsp. pallidum sequences that carry

the D allele and the T. p. subsp. pertenue Gauthier strain, which carries the D3 allele that is 95%

homologous to the D allele (Centurion-Lara et al. 2000b). Although sequence data are

unavailable, PCR analysis suggests that T. p. subsp. endemicum and non-Gauthier strains of T. p.

subsp. pertenue would cluster in the D2 clade (Centurion-Lara et al. 2000b). The D and D2

alleles differ from each other by a 330bp central region at bp 855-1180 and three smaller variable

regions at bp1275-1306, 1425-1503 and 1569-1626 (relative to the Nichols strain sequence). A

contiguous expanse encompassing the four variable regions (bp 855-1626) was removed and the

new alignment was used to generate a ML tree in which the eight T. p. subsp. pallidum

sequences comprise a monophyletic clade (data not shown).

An initial tprC phylogeny included all strains (not shown). This phylogeny contained a

very long branch leading to T. paraluiscuniculi which increased the scale by an order of

magnitude (data not shown). The long T. paraluiscuniculi branch, along with the paralogous

grouping of T. paraluiscuniculi tprC and D sequences in Figure 2-1, suggested a gene conversion

event in T. paraluiscuniculi that replaced the ancestral sequence at tprC with tprD. Table 2-3

summarizes the proposed gene conversion events between tprC and D). The T. paraluiscuniculi

sequence was removed and an alternative phylogeny was generated (Figure 2-2). In the new

phylogeny, the three human subspecies cluster separately with strong bootstrap support (94-

100%). Simian is contained in the T. p. subsp. pertenue clade although it is distinct from the two

T. p. subsp. pertenue sequences. The T. p. subsp. pallidum sequences form three well-supported

clusters within a monophyletic clade. Four of the T. p. subsp. pallidum sequences are identical to

each other as well as to the tprD sequences in these same strains and are considered to carry the

D allele at both loci (see examples of paralogous sequences clustering above). The tprC alleles in

the other four T. p. subsp. pallidum strains show high sequence homology with the D allele and

are labeled D-like alleles (Centurion-Lara et al. 2004) (Table 2-3). The fact that there is higher

similarity between the paralogous tprC and D sequences in the strains carrying the D allele (they

are identical) than between their respective homologs suggests that a gene conversion event has

occurred between tprC and tprDin the D allele strains. In this case, tprC appears to be the likely

donor since there is detectable homology among all of the subspecies at this locus, whereas the D

and D2 alleles differ by a long central variable region. Furthermore, the tprC and tprD sequences

in T. p. subsp. pertenue Gauthier strain are identical suggesting a third gene conversion event,

again with the tprC locus serving as the donor due to the detectable homology among the tprC

homologs (Table 2-3).

The tprI phylogeny includes the same isolates as the tprC phylogeny with the exception of

T. paraluiscuniculi, which does not have a tprI locus (Figure 2-2). The phylogeny for tprI shows

a relatively long branch between T. p. subsp. pallidum and the other treponemes, similar in

length (0.016 substitutions/site) to the corresponding branch in the tprC phylogeny (0.014

substitutions/site). There is 100% bootstrap support for the two monophyletic clades consisting

of T. p. subsp. pallidum (all eight pallidum sequences are identical) and T. p. subsp. endemicum,

respectively, moderate support for clustering of simian with T. p. subsp. pertenue SamoaD

(85%), and little support for a T. p. subsp. pertenue + simian clade (62%), although the simian

sequence clearly does not belong with the other two clades. The tprI phylogeny confirms the

close relationship between the unclassified Fribourg-Blanc simian isolate and T. p. subsp.

pertenue that was also evident in the tprC phylogeny. Phylogenetic clustering of these sequences

suggests that there is no strong species boundary, a conclusion that is supported by the fact that

the simian treponeme is reported to infect humans (Smith et al. 1971).

Phylogenetic analyses of subfamily II

The tpr Subfamily II consists of tprE, G, and J. Previous studies have shown that the T. p.

subsp. pallidum Nichols tprG and J sequences are highly homologous at the 5’ and 3’ ends while

the central regions show extreme divergence (Giacani, Hevner, and Centurion-Lara 2005),

specifically at two variable regions (V1 = sites 976-1510, with a small internal region of

homology at sites 1168-1295, and the much smaller V2 = sites 1879-1947) that are unlikely to

have evolved through point mutation.. Different V1 and V2 sequences are classified as “G” and

“J” motifs, which are used to define the G, J, and G/J alleles present at tprG and J loci. The G

allele is defined as a “G motif” at V1 and V2, the J allele is defined as a “J motif” at V1 and V2,

and the G/J allele is defined as a “G motif” at V1 and a “J motif” at V2. At the tprG locus,

analysis of our alignment shows that two of the three pallidum strains analyzed at this locus

(Nichols and Sea81-4) carry the G allele, while the other pallidum strain (MexicoA) and the

pertenue strain (Gauthier) carry the G/J allele. At tprJ, Nichols and MexicoA carry the J allele,

while Sea81-4 and pertenue Gauthier carry the G/J allele (data not shown). PCR analysis

indicates that the other five pallidum strains discussed in this study also carry the J allele at tprJ,

although sequence data do not exist for these strains. PCR analysis also indicates that the other T.

p. subsp. pertenue strain (SamoaD) as well as a T. p. subsp. endemicum strain (IraqB) carry the

G/J allele (Centurion-Lara, unpublished). In T. paraluiscuniculi, the positions corresponding to

tprE and J contain two almost identical G/J allele sequences that are designated the G2 and G1

alleles, respectively (Giacani et al. 2004). In T. paraluiscuniculi G1 and G2 alleles, the second

half of V1 is somewhat different than V1 in the human G/J allele, although it is still much more

similar to the G/J allele than to the J allele. Furthermore, in T. paraluiscuniculi, tprG has

recombined with tprI (Subfamily I) to form a single allele termed the “G/I” hybrid at the tprG

locus (Giacani et al. 2004).

For our phylogenetic analysis, tprG and J sequences were grouped together because of the

high amount of gene conversion at and between these loci (Figure 2-3). In this phylogeny, a long

branch with 100% bootstrap support leads to the Nichols and MexicoA tprJ sequences, while the

MexicoA tprG, Sea81-4 tprJ, and Gauthier tprG and J form a polytomy (no bootstrap support).

The tprG sequences from Nichols and Sea81-4 form a highly supported clade (99%) and are

clearly closer to the rest of the sequences than Nichols and MexicoA tprJ. Because the J allele is

only found in T. p. subsp. pallidum, while the G/J allele is found in T. p. subsp. pallidum,

pertenue, and endemicum and T. paraluiscuniculi, the latter is most likely ancestral. The “J

motif” at V2 may be the result of a gene conversion or lateral gene transfer, although no

sequence homology was found in a search of the public database. The “G” motif at V2 in the G

allele also occurs in Nichols tprE (data not shown) and may represent a small gene conversion

event from tprE to tprG that replaced only V2 of the ancestral G/J allele in Nichols and Sea81-4

(although more tprE sequence data are needed to be certain). The Gauthier tprG and J sequences

differ by only 2 bp and may also represent a paralogous clustering reflective of a gene

conversion event, although the polytomy makes it difficult to be certain. The T. paraluiscuniculi

clade is also highly supported (100%), which represents a paralogous clustering of closely

related G/J alleles at tprE and J.

Phylogenetic analyses of subfamily III

The tprK phylogeny includes multiple clones from all represented strains because the locus

is highly variable and accumulates mutations within a single infection (Figure 2-4). Seven

variable regions have been identified in tprK that are likely the result of gene conversion events,

with the probable donor sites located in the 3’ and 5’ flanking regions of tprD (Centurion-Lara et

al. 2004). These variable regions were removed from our analysis in order to focus on the non-

recombinant history of the locus (variable regions were slightly modified to capture additional

flanking sites, i.e. bp 132-180, 596-671, 749-834, 866-920, 963-1059, 1141-1215, and 1291-

1390). T. paraluiscuniculi appears to be an appropriate outgroup for tprK as the scale is on the

same order of magnitude as tprC and I. Strong bootstrap support is shown for the T.

paraluiscuniculi (100%) and T. p. subsp. endemicum (97%) clades, as well as for a combined T.

p. subsp. pallidum + T. p. subsp. pertenue clade (96%) in which these two subspecies are

unresolved relative to each other. However, the fact that variable regions in tprK appear to

accumulate more variation in response to selective pressure (Centurion-Lara et al. 2004) and

clones from single individuals show single nucleotide polymorphisms (SNPs) even after removal

of variable regions suggests that tprK may evolve differently than the other tpr loci.

Statistical Tests for Recombination

Four tests (RDP, MaxChi, Chimera and GENECONV) in the RDP2 package were used to

investigate recombination events in the Subfamily I, II and III genes (Table 2-4). We use these

methods to identify significant recombination and the location of recombinant breakpoints, but

we do not infer donor and recipient alleles because there is likely inter-locus recombination also

occurring that will be undetected because these methods focus on a single locus at a time. In all

cases, the same alignment files from the phylogenetic analyses were used for the recombination

analyses. Our primary interest was to investigate support for the putative regions of gene

conversion identified in the phylogenetic analyses. Relatively strong overlap was shown in the

results from all four methods, and, in general, MaxChi found the most recombination events,

which was previously shown to be the most powerful test in the RDP2 package (Posada and

Crandall 2001).

In tprD, one region of recombination was identified in all of the T. p. subsp. pallidum D2

allele sequences, pertenue Gauthier, and T. paraluiscuniculi (see Table 2-4 for exact location of

recombinant regions). These results are consistent with a recombination breakpoint present at

site 855, which marks the beginning of the central variable region that differentiates the D and

D2 alleles. In tprC, two regions of recombination in the T. p. subsp. pallidum D allele sequences

were identified at bp 137-889 and 1459-1728. This result is consistent with the 100% clustering

of these sequences within the T. p. subsp. pallidum clade (Figure 2-2). Multiple recombination

events were identified in MexicoA and Sea81-3 that are consistent with a branch leading to a

monophyletic clade containing MexicoA and Sea81-3 in the tprC phylogeny (Figure 2-2). In tprI

only one recombination event was identified in pertenue SamoaD although the sequence has

only two unique single nucleotide polymorphisms in this region and point mutation seems a

more likely evolutionary mechanism than recombination in this case.

In tprG and J, more than 40 recombination events were identified when the significance

level was set to p=0.01. This result was impossible to interpret precisely so the analysis was

performed again with more stringent settings of p=0.001 and the requirement that more than one

method was necessary to identify a recombination event. Five sequences showed no evidence of

recombination under these conditions: pallidum Sea81-4 tprJ (G/J allele), pertenue Gauthier

tprG (G/J allele), pertenue Gauthier tprJ (G/J allele), pallidum Nichols tprJ (J allele), and

pallidum MexicoA tprJ (J allele). However, all four methods identified recombination at the

region containing V2 in tprG sequences for both pallidum G alleles (Sea81-4 and Nichols) as

well as the pallidum G/J allele (MexicoA). There are four polymorphisms between V1 and V2

that are shared between the pallidum G alleles and MexicoA G/J, although they are not found in

any of the other G/J alleles from other subspecies, which may contribute to this result.

In tprK, with the extended variable regions excluded, no recombination events were found

by any of the methods. These results agree with the phylogenetic analyses, which also do not

indicate any recombination outside of the variable regions (although Giacani and colleagues

(2004) did identify a putative region of recombination at tprK in CuniculiA between V5 and V6

that was not detected in any of our analyses).

The results of the tests for recombination were consistent with the phylogenetic analyses in

the overall detection of a high level of recombination across the studied loci. The recombination

tests also identified new regions of recombination, particularly at tprC. Overall, far more

recombination was indicated at tprG and J than for any other locus studied here and this result is

consistent with our phylogenetic analyses that revealed multiple instances of paralogous

clustering and the presence of multiple divergent alleles at tprG and J.

Analysis of Nucleotide Diversity and Composition

Additional measures, such as nucleotide diversity and GC content, can be used to

investigate recombination events, with the acknowledgement that other phenomena also affect

these measures (Baldo et al. 2006). The amount of within-subspecies genetic diversity is low for

all three subspecies at loci tprC, I, and K (π = 0-.0076) (Table 2-5). At tprD and J, however, the

diversity within T. p. subsp. pallidum is very high (π = 0.101 and 0.0958, respectively),

reflecting the intra-subspecies gene conversion events discussed above. The amount of diversity

at tprG within T. p. subsp. pallidum is intermediate (π = 0.0154), and specifically lower than

tprJ, reflecting a smaller putative gene conversion event, i.e. the V2 region.

The pattern of genetic diversity between subspecies of T. pallidum differs among loci,

especially for T. p. subsp. pallidum. The Dxy nucleotide diversity between T. p. subsp. pertenue

and T. p. subsp. endemicum is fairly consistent within tprC, I, and K (no tprD sequence data

currently exist for T. p. subsp. endemicum). The Dxy distance between T. p. subsp. pallidum and

the other two subspecies is approximately doubled relative to the distance between T. p. subsp.

pertenue and T. p. subsp. endemicum at tprC and I, consistent with the long branches leading to

T. p. subsp. pallidum in these phylogenies. However, at tprK, the distance between T. p. subsp.

pallidum and T. p. subsp. pertenue is much smaller than the distance between T. p. subsp.

endemicum and the others (0.0028 vs 0.011 and 0.013), consistent with the clustering of T. p.

subsp. pallidum and T. p. subsp. pertenue in the tprK phylogeny. At tprG, the distance between

T. p. subsp. pallidum and T. p. subsp. pertenue is intermediate (Dxy = 0.0130), while at tprJ the

distance between T. p. subsp. pallidum and T. p. subsp. pertenue is only slightly lower than

between these same two subspecies at tprD (Dxy = 0.0962). Again, this is in agreement with the

proposed gene conversion or horizontal gene transfer event that created the highly divergent J

allele in most T .p. subsp. pallidum strains.

Previous studies have suggested that gene conversion events lead to increased GC content

at third codon positions (Eyre-Walker 1993; Galtier et al. 2001; Galtier 2003; Noonan et al.

2004). Although the molecular mechanism is unknown, it may be due to a GC bias in mismatch

repair, which is required to resolve conversion events (Galtier et al. 2001; Noonan et al. 2004).

Third positions reflect this bias more strongly because they are under less selective constraint

since base changes at this position are less likely to result in a change in amino acid. At each of

the six tpr loci studied here, GC content was increased at the third position (GC3) relative to the

first and second positions combined (GC1+2) (table 6) although not as dramatically as reported

in other systems (Galtier et al. 2001; Noonan et al. 2004). This analysis supports our general

finding of multiple gene conversion events at the studied loci.

Discussion

Intra-genomic homologous recombination appears to have been a major force in the

evolution of the tpr gene family in the pathogenic Treponema species. After the gene duplication

events that created the gene family, our phylogenetic analyses of tprC, D, I, G, J, and K suggest

that the high levels of homology among the loci have supported multiple gene conversion events

both within and between these tpr genes. Although lateral gene transfer can theoretically produce

the genetic signatures we describe, this mechanism has not been reported in T. pallidum (lateral

gene transfer has been identified as a probable evolutionary force in the genome of T. denticola

due to the signature presence of phage-mediated integration events and restriction-modification

systems that may serve as a barrier against lateral gene transfer, but neither of these signatures is

present in T. pallidum (Seshadri et al. 2004) and gene conversion appears more likely,

particularly at tprC, D, G and K where donor regions can be identified within the same genome.

No donor sequence was identified for the V1 region of the J allele at tprJ and, thus, horizontal

gene transfer cannot be definitively ruled out.

In Subfamily I, we propose three gene conversion events between loci tprC and tprD; 1) a

tprC-to-tprD conversion that introduced the D allele into tprD in the D pallidum strains, 2) a

tprC-to-tprD conversion in T. p. subsp. pertenue Gauthier strain that introduced the D3 allele

into tprD, and 3) a tprD-to-tprC conversion in T. paraluiscuniculi that introduced the D2 allele

into tprC (table 3). At this point, there are insufficient data to determine the order of the three

proposed gene conversion events. However, it is clear that the tprC-to-tprD conversions (#1 and

2) represent two distinct events since the pertenue Gauthier sequences differ by only two bp and

the pallidum sequences are identical, but the pertenue Gauthier and pallidum sequences differ

from each other by 55 bp. Furthermore, our results suggest that the tprC locus is likely to be

older than the tprD locus because there is more variation among the pallidum D-like alleles at

tprC compared to the pallidum D2 sequences at tprD that are identical (fig. 2a and b). At tprD,

the D2 allele is most likely the ancestral allele since it is found in multiple subspecies (i.e.

pallidum, pertenue, endemicum) as well as in T. paraluiscuniculi, while the D allele is only

found in a subset of pallidum strains. Non-D2 alleles in the four pallidum strains and pertenue

Gauthier are likely the result of two subsequent gene conversion events, as described above.

At tprC, a single gene conversion event (originating from tprD) is posited in T.

paraluiscuniculi based on the phylogenetic analyses. The RDP2 recombination analysis

identified several small, additional recombinant regions in all of the T. p. subsp. pallidum

sequences, which is consistent with the higher diversity observed in the T. p. subsp. pallidum

tprC sequences (Figure 2-2, Table 2-4). Close inspection of the tprC alignment (including

pallidum and non-pallidum strains) revealed the presence of a high number of non-synonymous

mutations in the pallidum strains that were grouped in clusters rather than scattered throughout

the alignment. The transition/transversion ratio was decreased in these clusters and initially

revealed a significant signal for positive selection at tprC in T. p. subsp. pallidum (data not

shown). However, when we examined an alignment of tprC, D, and I together, we found that the

majority of the transversions and non-synonymous mutations were unique to T. p. subsp.

pallidum at tprC. The presence of clustered mutations, with a high frequency of transversions,

argues against accumulated point mutations and instead suggests that multiple smaller, ‘site-

specific’ gene conversion events may have occurred at tprC in T. p. subsp. pallidum. This may

be similar to the presence of multiple, variable regions in tprK, although the tprC regions do not

appear to undergo rapid sequence variation as occurs in tprK. These putative recombination

events at tprC would have to have occurred prior to the major tprC-to-tprD gene conversion that

replaced the D2 allele with the D allele at tprD since the D alleles at tprC and D are identical

(table 3). In proteins with antigenic relevance, recombination produces variation that may have

an adaptive purpose. However, it is not understood whether these proteins are more likely to

undergo recombination or whether high variability simply increases the power of detection of

these events (Baldo et al. 2006). In the case of tprC, which may be a cell-surface protein as

predicted by PSORT analysis (data not shown), it appears that the majority of tprC variation may

be a result of gene conversion events supporting an adaptive explanation.

At the tprI locus, variation among the treponemes was scattered and did not highlight a

specific region that might have undergone gene conversion as described above for other loci.

However, the fact that all eight of the T. p. subsp. pallidum tprI sequences were identical is

intriguing, considering this was not the case at any other locus in our study. There are several

possible explanations for the 100% sequence homology including a significantly lower (point)

mutation rate at tprI in T. p. subsp. pallidum, a more recent divergence of the T. p. subsp.

pallidum tprI sequences, functional constraint at tprI in T. p. subsp. pallidum, or a gene

conversion event that occurred prior to evolution of the T. p. subsp. pallidum tprI sequences (if a

sequence longer than that in our dataset were replaced, the recombination event would go

undetected by our recombination analysis that looked for the endpoints of recombination events).

Both a lower mutation rate and more recent evolution of T. p. subsp. pallidum tprI seem unlikely

because the lengths of branches leading to T. p. subsp. pallidum at tprI and tprC are comparable

(0.016 and 0.014 substitutions/site, respectively). Using a tprI phylogeny, a test for selection on

the branch leading to the eight T. p. subsp. pallidum sequences indicated that the non-

synonymous/synonymous rate ratio was not significantly different from 1.0 (data not shown).

Thus, functional constraint does not appear to explain the lack of mutations at this locus in the

pallidum subspecies. No specific gene conversion events were identified at tprI in our analyses

(although GC3 content was highest at tprI). The most likely explanation may be that the rate of

point mutations is generally low at all tpr genes and the pallidum tprI sequences have escaped

(by chance) the frequent gene conversion events that are mainly responsible for the diversity

seen at tprC and D in T. p. subsp. pallidum.

In Subfamily II, the various alleles that occur at tprG and J (and at tprE and J in T.

paraluiscuniculi), i.e. the G, J, and G/J alleles, are strongly suggestive of multiple gene

conversion events although the directionality of these events is difficult to determine due to the

complexity of the DNA sequences at these loci. Because the G/J allele occurs in multiple

subspecies and at multiple loci (Figure 2-3), it appears to be the ancestral sequence. The Nichols

and Mexico tprJ sequences have a divergent central region that is suggestive of a gene

conversion event (or horizontal gene transfer) that replaced the G/J allele (most likely only the

V1 region was replaced with a “J motif” V1). Unlike the scenario proposed above for gene

conversions at tprC and D, no donor region is immediately apparent for the gene conversion that

created the “J motif” V1 (BLAST searches did not identify any homology between the “J motif”

at the VI variable region and any other treponemal or non-treponemal sequences). Interestingly,

the clustering of the pallidum strains is not consistent between loci. At tprJ, only Sea81-4 has

apparently escaped the gene conversion which created the divergent J allele. At tprG, however,

Nichols and Sea81-4 appear to have shared a gene conversion event creating the “G motif” at V2

to the exclusion of MexicoA. This is in contrast with tprD, where the ancestral D allele was

replaced by the D2 allele in Nichols, but not in MexicoA or Sea81-4. A consistent history of the

evolution of the subspecies pallidum strains therefore cannot be ascertained from these data.

Previous studies have demonstrated a high frequency of gene conversion events at the tprK

locus in T. p. subsp. pallidum (Centurion-Lara et al. 2004). Seven variable regions have been

identified at tprK that are likely the result of multiple gene conversion events, with the probable

donor sites located in the 3’ and 5’ flanking regions of tprD. A multi-site/multi-step

recombination process has been described for the accumulation of diversity within the variable

regions. This diversity was shown to accumulate more dramatically in the presence of adaptive

immune pressure suggesting a mechanism to generate antigenic diversity in T. pallidum

(Centurion-Lara et al. 2004). It is probable that the gene conversion operating at tprK is different

than that affecting the tprC and D loci. Gene conversion at the tprK locus generates diversity and

each event seems to affect a relatively small region (each variable region is 48-99 bases). In

contrast, gene conversions at the tprD loci appears to result in concerted evolution and affect a

larger portion of the gene since the T. p. subsp. pallidum D alleles are identical at both tprC and

D. These seemingly contradictory outcomes of concerted evolution and increased diversity, both

mediated by gene conversion, may be explained by differing stages of evolution in a multi-gene

family (Santoyo and Romero 2005). In the first stage after initial gene duplication to create the

multi-gene family, homogenization or concerted evolution is likely the dominant force because

the high sequence homology drives gene conversion at a faster rate than point mutation occurs.

As point mutations accumulate over time, homologous recombination is no longer as effective

and the rate of point mutations may surpass that of gene conversion. At this stage of evolution of

a multi-gene family, smaller scale ‘site-specific’ gene conversion may become more significant,

thus allowing concerted evolution to occur in small regions while antigenic variation is created

throughout the gene (Santoyo and Romero 2005). This explanation may indicate a younger

history for the tprD sequences (i.e. concerted evolution stage) relative to tprK, while the tprK

(and possibly tprC) sequences are experiencing site-specific gene conversion events, possibly

leading to increased antigenic variation.

Several scenarios have been proposed for the evolution of the human treponemal species

(Baker and Armelagos 1988; Powell and Cook 2005). A New World vs. Old World origin of

venereal syphilis has been long debated (For a recent review, see Powell and Cook 2005). The

Columbian hypothesis originally suggested that venereal syphilis (T. p. subsp. pallidum)

originated in the New World and was brought to Europe by Columbus’ crews returning from the

New World (Crosby 1969). This was based on the paucity of skeletal and historical evidence for

treponemal disease in the Old World prior to the early 1500s. For example, Rothschild (2003)

proposed that yaws (T. p. subsp. pertenue) was the most ancestral of the three T. pallidum

subspecies and was present at least as far back as the origin of modern humans in Africa, and the

other two subspecies each derived from yaws, with T. p. subsp. pallidum evolving in the New

World no more than ~2000 years ago (Rothschild 2003). Baker and Armelagos (1988) have

proposed an alternative Columbian hypothesis which suggests that venereal syphilis evolved in

Europe from a New World non-venereal treponeme that was introduced to Europe by Columbus’

crews. This hypothesis is based on the lack of specific evidence for venereal syphilis in the New

World, despite the overwhelming evidence of treponemal disease. The Pre-Columbian

hypothesis suggests that treponemal diseases, including venereal syphilis, existed in the Old

World prior to Columbus’ voyages but were diagnosed incorrectly. One scenario suggests that

pinta was the original form present throughout the world during the Pleistocene, followed by the

evolution of yaws (12,000 years ago), then endemic syphilis (9,000 years ago) and, finally,

venereal syphilis (5,000 years ago) (Hackett 1963). Finally, a Unitarian hypothesis, based on

skeletal morphology data, has been advanced by Hudson (1965), who suggests that venereal

syphilis, endemic syphilis, yaws, and pinta are not in fact distinct diseases, but rather are

environmentally determined manifestations of the same disease. More recently, Armelagos and

colleagues (2005) have reviewed the molecular literature on human treponemes and they suggest

a lack of molecular distinction between these subspecies.

Our molecular data suggest that the three subspecies are legitimately classified as distinct

entities (Figure 2-2). The phylogenies for tprC and tprI demonstrate high bootstrap support for

the separation of the three subspecies into separate clades, with relatively long branches leading

to endemicum and pallidum. In tprD, pertenue is artificially closer to some pallidum sequences

because of the gene conversion event that separates pallidum D and D2 alleles. In the tprK

phylogeny, there is no bootstrap support to separate pallidum and pertenue, but the tprK

phylogeny is difficult to interpret because this locus has an exceedingly high mutation rate as

demonstrated by the fact that multiple tprK sequences exist in a single individual (even when the

variable regions are removed). Furthermore, AMOVA results reveal a significant amount of

among-subspecies variation (70-95%, p=0.000) when analyzing tprC, I and K from all three

subspecies further supporting the genetic distinctiveness of the subspecies (data not shown). It is

clear that recombination has played a significant role in the evolution of the tpr genes, and

possibly in the evolution of the treponemes. Therefore, studies that look only at SNPs, as

reviewed by Armelagos et al. (2005), will miss this high level of recombination and it may

appear there are few subspecies-specific variants because recombination has frequently

scrambled alleles within a subspecies, e.g. the D and D2 alleles at tprD. Moreover, the fact that

these recombination events are unique to a subspecies argues strongly in favor of the genetic

distinctiveness of the three subspecies.

Ascertaining the distinctiveness of the subspecies is prerequisite to resolving their

evolutionary history. In general, our analyses do not appear to support a dramatically older origin

of yaws relative to venereal syphilis contra Rothschild (2003), i.e. we do not see greater levels of

variation or longer tree branches for T. p. subsp. pertenue relative to T. p. subsp. pallidum (Table

2-4, Figures 2-2 and 2-4). Furthermore, our results do not clearly support current hypotheses that

consider venereal syphilis to be the most recently evolved of the treponemal syndromes. For

instance, multiple gene conversion events have occurred within subsets of T. p. subsp. pallidum

strains at tprC, D, G, and J (Figures 2-2 and 2-3) arguing for an older evolution of the entire

subspecies of pallidum in order to allow sufficient time for these events to have occurred. The

long branch in the tprI phylogeny leading to T. p. subsp. pallidum reflects a large number of

point mutations, which are assumed to evolve in a clock-like manner, suggesting that more

evolution has occurred on the branch to pallidum than on the branches to the other two

subspecies. Our results are generally consistent with a relatively coincident evolution of the three

human treponemal subspecies as proposed by Hackett (1965) but contra Rothschild (2003) who

proposed dramatically different timeframes for evolution of yaws and venereal syphilis.

Moreover, the T. p. subsp. pallidum sequences appear to carry too much variation to support the

modified Columbian hypothesis of evolution of venereal syphilis within the past 500 years

(Baker and Armelagos 1988). This is further supported by the fact that at least one T. p. subsp.

pallidum strain (Nichols) was collected in the early 1900s and is identical at the loci examined to

several other strains collected later in the 20th century, suggesting that the mutation rate is not

high enough to have created variants within this time frame. Additional samples, e.g. more

representatives of T. p. subsp. endemicum and T. p. subsp. pertenue, and analysis of more loci

will be needed to definitively answer questions concerning the origin and evolution of the

treponemes. Moreover, the high levels of recombination revealed in our study suggest that the

analysis of contiguous sequence data, as opposed to analysis of scattered SNPs, will be necessary

to identify possible recombination events prior to reconstruction of the evolutionary history of

the treponemes.

Table 2-1. Treponema isolates used in this study Species/ Subspecies

Strain Geographical source

Year isolated

tprC tprD tprG tprI tprJ tprK Reference

T. p. subsp. pallidum

Nicholsa Washington, D.C.

1912 +e + + + + + (Nichols and Hough 1913)

Sea81-4b Seattle 1980 + + + + + + (Lukehart et al. 1988) Chicagoc Chicago 1951 + + -f + - - (Turner and Hollander 1957) Bal73-1c Baltimore 1968 + + - + - - (Hardy et al. 1970) Bal3c Baltimore Unknown + + - + - - Bal7c

Sea81-3bBaltimore Seattle

1976 1981

(Tramont 1976) (Lukehart et al. 1988)

MexicoAc Mexico 1953 + + + + + - (Turner and Hollander 1957) T. p. subsp. pertenue

Gauthierd Brazzaville 1960 + + + + + + (Gastinel et al. 1963)

SamoaDc Western Samoa

1953 + - - + - + (Turner and Hollander 1957)

T. p. subsp. endemicum

BosniaAc Bosnia 1950 + - - + - + (Turner and Hollander 1957)

IraqBc Iraq 1951 + - - + - + (Turner and Hollander 1957) T. paraluiscuniculi

CuniculiAc Baltimore Unknown + + + - + +

Simian species Fribourg-Blancc

Guinea 1966 + - - + - - (Fribourg-Blanc, Mollaret, and Niel 1966)

a Originally provided by James N. Miller, University of California, Los Angeles, CA b Strain isolated in Seattle by Sheila A. Lukehart, University of Washington, Seattle, WA c Strains provided by Paul Hardy and Ellen Nell, Johns Hopkins University, Baltimore, MD d Provided by Peter Perine, Centers for Diseases Control and Prevention, Atlanta, GA e ‘+’ indicates sequence data that were generated in the current study f ‘-‘ indicates sequence data were not generated in the current study

Table 2-2. T. pallidum primers used in this study. Locusa Strains

sequenced Sense primer Antisense Primer Length of

ORFs analyzed (bp)

BosniaA, Simian, SamoaD, CuniculiA

5’-gggggtgaggtagaagtgaga 5’-ccctcatccgagacaaaat 1797

Nichols, Bal7, Bal73-1, Chicago, Sea81-3, Sea81-4, MexicoA, Bal3, IraqB, Gauthier

5’-ttagaggaggcgtcagaacg 5’-ggtccgtgagcaggaagtaa 1797

Nichols, Bal7, Bal73-1, Chicago, Sea81-3, Sea81-4, MexicoA, Bal3, Gauthier

5’- cgcgtaccgctttgcagttca 5’- catggcattggtgagaaagacg 1791

CuniculiA 5’-aagaggttcaggaagcaacg 5’-acttcgtaggagcagcagga 1791

tprI BosniaA, IraqB, Simian, SamoaD

5’-cgtcaccctctcctggtagt 5’-atccctcgcctgtaaactga 1830

Nichols, Bal7, Bal73-1, Chicago, Sea81-3, MexicoA, Bal3, Sea81-4

5’-tgggagcttgtatgcagatg 5’-gggaaccctctcccttcc 1830

Gauthier 5’-agggtgagggggctactaga 5’-atccctcgcctgtaaactga 1830

tprG Gauthier, Sea81-4 5’-ccctgcgtttcccatctg 5’-gtactaccttcccccggtct 2268

MexicoA 5'-caggttttgccgttaagc 5'-aatcaagggagaataccgtc 1812 CuniculiA 5’-cgcgtacccacttctctctc 5’-gtactaccttcccccggtct 2268

tprJ MexicoA 5'-caggttttgccgttaagc 5'-aatcaagggagaataccgtc 1812

Gauthier 5’-agggtgagggggctactaga 5’-atccctcgcctgtaaactga 2268

Sea81-4 5’-cgagtgaggctcatcaagaa 5’-agtaagccctgcccaagaac 2268 CuniculiA 5’-aagtttgctttcagat 5’-gtactaccttcccccggtct 2268

BosniaA, IraqB, Simian, SamoaD, Gauthier

5’-agtaatggttttcggcatcg 5’-ccatacatccctaccaaatca 1470

Sea81-4, CuniculiA 5’-tcccccagttgcagcactat 5’-tcgcggtagtcaacaatacca 1470

a PCR Cycling conditions: tprC (Other than IraqB or CuniculiA): Denaturation at 94°C for 3 minutes, then 40 cycles of 94°C for 1 minute, 60°C for 2 minutes, and 72°C for 1 minute, with a final elongation step of 72°C for 10 minutes. tprC (IraqB): 40 cycles of 94°C for 1 minute, 58°C for 1 minute, and 72°C for 2 minutes, with a final elongation step of 72°C for 2 minutes. tprC, D ,G, J, K (CuniculiA): Denaturation at 95°C for 2 minutes, then 35 cycles of 95°C for 1 minute, 60°C for 1 minutes, and 72°C for 2 minute, with a final elongation step of 72°C for 3 minutes. tprD: (Other than CuniculiA)94 °C denaturation for 3 minutes, then 30 cycles of 94°C for 1 minute, 60°C for 2 minutes, and 72 °C for 1 minute, with a final extension step of 72 °C for 10 minutes. tprI: Denaturation at 94°C for 3 minutes, then 30 (Nichols, Bal7, Bal73-1, Chicago, Sea81-3, , MexicoA, Bal3, Sea81-4) or 40 cycles (BosniaA, IraqB, Simian, SamoaD) of 94°C for 1 minute, 60°C for 2 minutes, and 72°C for 1 minute, with a final elongation step of 72°C for 10 minutes. tprI, J (Gauthier): Denaturation at 94°C for 1 minute, then 40 cycles of 98°C for 10 seconds, 63°C for 5 minutes, with a final elongation step of 72°C for 10 minutes (used LA Kit, Takara Bio Inc. Shiga, Japan). tprG (Gauthier): Denaturation at 94°C for 1 minute, then 45 cycles of 98°C for 10 seconds, 62°C for 5 minutes, with a final elongation step of 72°C for 10 minutes (used LA Kit, Tokara Bio Inc. Shiga, Japan). tprG (Sea81-4): Denaturation at 94°C for 1 minute, then 35 cycles of 98°C for 20 seconds, 68°C for 5 minutes, with a final elongation step of 72°C for 10 minutes (used LA Kit, Tokara Bio Inc. Shiga, Japan). tprG, J (MexicoA): Denaturation at 94°C for 3 minutes, then 40 cycles of 94°C for 1 minute, 63°C for 2 minutes, and 68°C for 1 minute, with a final elongation step of 68°C for 7 minutes. Note that these primers amplify a partial ORF. tprK: (All except CuniculiA) Denaturation at 94°C for 3 minutes, then 40-45 cycles of 94°C for 1 minute, 60°C for 2 minutes, and 72°C for 1 minute, with a final elongation step of 72°C for 10 minutes. Table 2-3. Polymorphism at the tprC and tprD loci among pathogenic treponemes.

Species/Subspecies Strains tprD allele Directionality of proposed gene

conversion events

tprC allele

T. p. subsp. pallidum Nichols Chicago Bal73-1

D D D D

← ← ← ←

D D D D

Sea81-3 Sea81-4

Bal3 MexicoA

D2 D2 D2 D2

No conversion No conversion No conversion No conversion

D-like D-like D-like D-like

T. p. subsp. endemicum Iraq Bosnia

No conversion NDa

D3-like ND

T. p. subsp. pertenue Gauthier D3 ← D3 SamoaD D2 No conversion D3-like

a ND = No sequence data

Table 2-4. Recombinant regions identified by RDP2. Subspecies Strain tprC tprD tprI tprK tprG/J T. p. subsp. pallidum

MexicoA ~800-1600a,c,d,e

~1450-1750a,c,d,1-854e

NCf NC 1669-2149

(G/J)b,c

Sea81-3 ~1450-1750a,c,d,

1-1500c1-854e

NC ND NDg

Sea81-4 NC 1-854e

NC NC 1669-2149

(G)b,c,d,e

Bal3 NC 1-854e

NC ND ND

Nichols 137-889c

1459-1728c,dNC NC ND 1556-2176

(G)b,c,d,e

Chicago 137-889c

1459-1728c,dNC NC ND ND

Bal73-1 137-889c

Bal7 137-889c

T. p. subsp. pertenue

Gauthier NC 1-854e

1701-1768c,d,e

1627-1796d

NC NC NC

SamoaD NC 143-621c NC ND T. p. subsp. endemicum

BosniaA 1459-endc,d ND NC NC ND

IraqB 714-1471c,d ND NC NC ND Simian NC ND NC ND ND a Multiple regions were identified by all four programs; the consensus is reported here. b RDP program identified this region as recombinant. c MaxChi program identified this region as recombinant. d Chimera program identified this region as recombinant. e GENECONV program identified this region as recombinant. f NC = no conversion identified. g ND = no DNA sequence data.

Table 2-5. Levels of nucleotide diversity within and between subspecies. Within

subspecies (π)

Between subspecies (Dxy)

Within subspecies (π)

Between subspecies (Dxy)

Locus pallidum pertenue Locus pallidum pertenue Locus tprD 0.10105 NCa tprD 0.10105 NCa tprD tprC 0.00659 0.00111 tprC 0.00659 0.00111 tprC tprI 0.0 0.00765 tprI 0.0 0.00765 tprI tprK 0.00066 0.00443 tprK 0.00066 0.00443 tprK tprG 0.01541 NC tprG 0.01541 NC tprG tprJ 0.09584 NC tprJ 0.09584 NC tprJ a NC = not calculated because only one DNA sequence was available. b ND = no DNA sequence data Table 2-6. Average GC content at combined 1st + 2nd (GC1+2) and 3rd codon (GC3) positions Locusa GC1+2 GC3 tprD 0.524 0.634 tprC 0.537 0.625 tprI 0.535 0.654 tprK 0.481 0.569 tprG+J 0.523 0.599 a All available human sequences were used for each locus; see Table 1 for list of sequences used.

Figure 2-1. Unrooted ML phylogenies of multiple tpr genes.

(a) Unrooted ML phylogeny of twelve Nichols tpr nucleotide sequences based on an alignment of 2708 bp. (b) ML phylogeny of tprC, D, and I nucleotide sequences based on an alignment of 1797 bp using mid-point rooting. Subspecies designations are indicated by vertical lines. Grey boxes indicate clades in which paralogous sequences group together. The specific tpr locus designation is appended to each strain name. Bootstrap values based on (a) 250 and (b) 1000 replications are shown next to branches.

Figure 2-2. ML phylogenies for tprD, C, and I.

Phylogenies are based on nucleotide sequences of (a) tprD; (b) tprC; (c) tprI. Human subspecies are circled and labeled. Bootstrap values based on 1000 replications are shown next to branches.

Figure 2-3. ML phylogeny of tprG and J.

The specific tpr locus designation is appended to each strain name. Subspecies are circled and labeled. For T. paraluiscuniculi, G1 signifies the G/J allele at tprJ and G2 signifies the G/J at tprE. Bootstrap values based on 1000 replications are shown next to branches.

Figure 2-4. ML phylogeny of tprK.

Rooted using CuniculiA strains. Subspecies designations are indicated by vertical lines and labeled. Bootstrap values based on 1000 replications are shownnext to branches. .

CHAPTER 3 LINKAGE DISEQUILIBRIUM AND ASSOCIATION ANALYSIS OF ALPHA SYNUCLEIN

(SNCA) AND ALCOHOL AND DRUG DEPENDENCE IN TWO AMERICAN INDIAN POPULATIONS2

Introduction

Alpha synuclein is involved in dopaminergic neurotransmission and has been implicated in

a number of neurological disorders. An association between α-synuclein and neurodegenerative

disorders is well established. Overexpression of α-synuclein has been implicated in the etiology

of Parkinson’s disease (Polymeropoulos et al. 1997; Kruger et al. 1998) and Alzheimer’s disease

(Ueda et al. 1993), possibly because of neurodegeneration of dopamine neurons due to toxic

build-up of α-synuclein (Mash et al. 2003).

More recently, α-synuclein has been associated with neuropsychiatric disorders, such as

alcoholism (Liang et al. 2003; Bonsch et al. 2005a; Bonsch et al. 2005b; Bonsch et al. 2005c)

and drug addiction (Mash et al. 2003; Kobayashi et al. 2004). Alpha synuclein is located at a

quantitative trait locus for a(Bonsch et al. 2005a; Bonsch et al. 2005c) Alcohol preference in

humans and levels of its mRNA and protein are elevated in alcohol-preferring individuals in rats

and macaque monkeys (Liang et al. 2003; Spence et al. 2005; Walker and Grant 2006). The

complex microsatellite repeat, NACP-REP1, which is located ~10kb upstream of the α-synuclein

gene (SNCA) has been associated with alcohol dependence in humans; specifically, longer alleles

were correlated with elevated levels of α-synuclein and were more frequent in alcohol dependent

patients (Bonsch et al. 2005b). Increased methylation of the SNCA promoter was also detected in

alcoholic patients (Bonsch et al. 2005c), and elevated SNCA mRNA and protein levels have also

been associated with craving in alcoholic patients (Bonsch et al. 2005a; Bonsch et al. 2005c). 2 Clarimon, J., R. R. Gray, L. N. Williams, M. A. Enoch, R. W. Robin, B. Albaugh, A. Singleton, D. Goldman, and C. J. Mulligan. 2007. Linkage disequilibrium and association analysis of alpha-synuclein and alcohol and drug dependence in two American Indian populations. Alcohol Clin Exp Res 31:546-554.

With respect to drug abuse, three out of four SNPs assayed in intron 1 of SNCA were found to be

significantly associated with methamphetamine psychosis/dependence (Kobayashi et al. 2004).

An overexpression of SNCA was also observed in the dopamine neurons of cocaine abusers

although α-synuclein may not directly increase the risk of drug abuse and it was speculated that

SNCA overexpression may be a protective response to changes in dopamine turnover resulting

from cocaine abuse (Mash et al. 2003). In general, overexpression of α-synuclein is thought to

interfere with dopaminergic neurotransmission, which has been proposed as a main mechanism

for withdrawal and craving (Self et al. 1995), two important factors for the development,

maintenance and relapse of addictive disorders.

The gene for α-synuclein has been mapped to chromosome position 4q21.3-22 (Chen et

al. 1995; Shibasaki et al. 1995; Spillantini, Divane, and Goedert 1995). Independent genome-

wide linkage studies have provided modest evidence that a locus in this region contributes to

alcohol dependence in one of the American Indian populations analyzed in the current study

(Southwest population; (Long et al. 1998; Mulligan et al. 2003) as well as in Euro-Americans

(Reich et al. 1998). Follow-up studies on the Euro-American population demonstrated strong

linkage to a phenotype defined by the maximum number of drinks consumed on one occasion

(Saccone et al. 2000). A recent study of Mission Indians reported no linkage in this region to a

diagnosis of alcohol dependence, but detected modest support for linkage to a more narrowly

defined phenotype of drinking severity (Ehlers et al. 2004a). This region has also been associated

with drug abuse through a genome-wide single-nucleotide polymorphism (SNP) genome scan

(Uhl et al. 2001).

In our study, we assayed 15 SNPs in the α-synuclein gene and one upstream

microsatellite repeat (NACP-REP1) in participants belonging to Southwest (n=514) and Plains

(n=420) American Indian populations. Patterns of linkage disequilibrium (LD) across the

assayed SNPs were similar in both tested populations and were consistent with the LD patterns

of the SCNA region in Caucasians, Africans and Asians, as reflected in the HapMap database

(www.hapmap.org). The assayed alleles and constructed haplotypes were tested for association

with alcohol dependence or alcohol use disorders (pooled diagnoses of alcohol dependence and

alcohol abuse) and drug dependence or drug use disorders (pooled diagnoses of drug dependence

and drug abuse), which are disorders that reach lifetime prevalences as high as 64% in the study

populations. Individual alleles and constructed haplotypes were also tested against two symptom

count phenotypes (all 18 questions and the eight questions that are diagnostic for alcohol

dependence).

Sampling Strategy

Blood samples and clinical data were collected from 514 adult members of a SW American

Indian tribe and from 420 adult members of a Plains American Indian tribe. Participants were

initially chosen at random from the tribal registry, and family members of ascertained alcoholics

were subsequently included. Descriptive data on both populations are presented in Table 3-1

(only a subset of these individuals was typed for a sufficient number of SNPs to be included in

the haplotype analysis, see Table 3-2). All participants were >21 years and required to have a

minimum of 25% ancestry to be included on the register. Williams et al. (1992) found a high

correspondence between overall levels of stated ancestry and ancestry estimated from genetic

markers and they found evidence for <5% non-American Indian admixture in the SW population.

Belfer et al. (2006) report low admixture with non-American Indian tribes in the Plains

population. Informed consent was obtained under a human subjects research protocol approved

by the respective Tribal Councils of each population group and the Institutional Review Board

(IRB) of the National Institute on Alcohol Abuse and Alcoholism.

Testing Instruments, Interviews, and Psychiatric Diagnoses

Focus groups comprised of tribal staff and community members reviewed testing

instruments and questionnaires for potential cultural biases and general suitability to the

population. Research diagnoses for lifetime alcohol dependence and abuse and lifetime drug

dependence and abuse were based on: 1) semi-structured psychiatric interviews using the

Schedule for Affective Disorders and Schizophrenia - Lifetime Version (SADS-L) with probes

added to enable diagnoses using both Research Diagnostic Criteria and Diagnostic and Statistical

Manual of Mental Disorders, Third Edition-Revised (DSM-III-R, American Psychiatric

Association, 1987) criteria (Robin et al. 1998), 2) medical, educational, court, and other records,

and 3) corroborative information from family members. The SADS-L was administered to all

subjects by a psychologist experienced with psychiatric assessment in this tribe and other

American Indian populations. DSM-III-R diagnoses of alcohol dependence were made from the

SADS-L by following operationally defined criteria and using the instructions of Spitzer et al.

(1989) (an additional criterion of heavy drinking for one year or more was added for a diagnosis

of alcohol dependence in the Plains population; (Belfer et al. 2006). Diagnoses were made from

the SADS-L interview data independently by two raters: a clinical social worker and a clinical

psychologist. Diagnostic differences were resolved in a consensus conference that included a

senior psychiatrist experienced in diagnosis in American Indian people. Sampling strategy,

interview procedure, and diagnosis protocol are summarized from Belfer et al. (2006), Long et

al. (1998), and Robin et al. (1998).

Genotyping

Genotype data from the HapMap project (www.hapmap.org) were used to generate an

optimal set of six tagging SNPs through the use of TagIT software. However, because the

HapMap SNP frequencies were calculated using Euro-American populations, we added nine

more SNPs in order to provide full coverage of genetic variation within this locus in American

Indian populations (Figure 3-1). Thirteen of the SNPs were assayed using Taqman® Assays-by-

DesignSM SNP Genotyping Services (Applied Biosystems). Thermal cycling and end-point PCR

analysis was performed on an ABI PRISM® 7900HT Sequence Detection System. Primer and

probe sequences are available upon request. Two SNPs, rs920624 and rs3775423, were assayed

as restriction fragment length polymorphisms by digesting amplification products with restriction

enzymes PsiI or MseI and SspI (New England Biolabs, Beverly, MA), respectively, and

electrophoresing digests on 2% agarose gels. Primers for rs920624 were 5’-

ACTACTTCTCTGTTGGATTGC-3’ and 5’-AAGATTCTTCACCTCTGTGTG-3’ and for

rs3775423 were 5’-GTATCCAATGCCCAAAGG-3’ and 5’-TGCCTCAGAAAGAACAGATG-

3’. The NACP-REP1 dinucleotide repeat polymorphism was amplified as previously described

(Farrer et al., 2001). PCR products were run on an ABI PRISM® 3100 (Applied Biosystems)

automated sequencer. GeneScanTM analysis software (version 3.7) and Genotyper® (version 3.7)

were used to assess fragment sizes.

Statistical Analysis

Previous studies have demonstrated that the degree of genetic relationship (kinship) in both

the Plains and SW samples is low and close to the average for the source populations, thus

permitting analyses that assume independence of individual samples (Robin et al. 1998; Belfer et

al. 2006). Hardy-Weinberg equilibrium was assessed by Fisher’s exact test, implemented in

Arlequin ver.2.000 program (Schneider, Roessli, and Excoffier 2000). PHASE ver.2.1 (Stephens

and Donnelly 2003) uses a Bayesian approach to reconstruct haplotypes from unphased

population genotypic data and takes into account recombination events and decay of LD with

distance between markers, thus leading to a more accurate inference of the real haplotype. One

thousand permutations were performed for each comparison. Haploview (Barrett et al. 2005) was

used to visualize LD relationships between SNCA SNPs as well as to ascertain the tagSNPs that

resulted from the LD analyses. LD blocks were constructed following the D’ method by Gabriel

et al. (2002) also implemented in Haploview.

A total of four clinical phenotypes were tested: alcohol dependence, alcohol use disorder

(diagnoses of alcohol dependence and alcohol abuse were pooled), drug dependence, and drug

use disorder (diagnoses of drug dependence and drug abuse were pooled). Differences in allele

and genotype distributions were analyzed using the chi square test and two-tailed P values are

presented. Data were analyzed using SPSSTM ver.11.0 for Windows (SPSS, Chicago, IL).

Haplotype frequency comparisons between cases and controls were performed with PHASE

ver.2.1 (Stephens and Donnelly 2003). In order to correct for multiple comparisons, global P

values were calculated using the COCAPHASE module of the UNPHASED statistical package

(Dudbridge 2003). Permutation correction was performed using 10,000 permutations. Symptom

counts were also used as two additional phenotypes, which were calculated as 1) the total

number of affirmative responses to the SADS-L interview questions (Endicott and Spitzer 1978)

and 2) the total number of affirmative responses to eight questions used in the diagnosis of

alcohol dependence. ANOVA tests were used to analyze the variance of the two symptom count

phenotypes among the four most common haplotypes and among the six SNPs used to define

haplotypes using the R program (R Foundation for Statistical Computing, Vienna, Austria). The

CLUMP program (Sham and Curtis 1995) was used to analyze contingency tables resulting from

NACP-REP1 alleles. Significance was assessed using Monte Carlo approach by performing a

total of 10,000 simulations.

A power calculation, based on haplotypes, indicated that our study had 83-100% power

to detect an association at an odds ratio of 2.0 using pooled haplotype frequencies, 91% power

based on the most frequent haplotype in the SW population and 87% power based on the most

frequent haplotype in the Plains population (p = 0.05).

Results

Based on 14 common SNPs located within the SNCA gene, similar LD patterns were

detected in both American Indian populations that are also consistent with the LD structures of

European Caucasian, Yoruban (African) and Chinese and Japanese Asian populations in the

HapMap database (Figure 3-1; SNP rs920624 and the NACP-REP1 polymorphism were not

included in the comparative LD analysis because there are no associated frequency data in the

HapMap database). Based on a strict criterion of continuous LD >90%, one small and one large

LD block were defined in the SW population and three small LD blocks were identified in the

Plains population. A less stringent criterion based on the presence of a recombination region

between SNPs 4 and 5 suggests two large LD blocks (SNPs 2-4 and 5-14) present in both

populations. Haplotype tests can be more powerful than using only one SNP to define a

haplotype block (Schaid 2004), so SNPs 3, 4, 8, 9, 12, and 13 (as depicted in Figure 3-1) were

chosen to define haplotypes in our analysis. The high level of LD suggests that we have assayed

the full spectrum of SNCA variation present in our study populations, i.e. effectively all SNPs

currently identified at SNCA (433 SNPs, http://www.ncbi.nlm.nih.gov/SNP/) were assayed.

Phenotypes were investigated with respect to individual SNPs as well as to the haplotype blocks

defined above.

Frequencies of all analyzed SNPs were in Hardy-Weinberg equilibrium for cases and

controls. We did not find any allele or genotype frequency differences between cases and

controls for alcohol dependence, alcohol use disorders (pooled diagnoses of alcohol dependence

and alcohol abuse), or drug use disorders (pooled diagnoses of drug dependence and drug abuse)

when total populations were tested (p values for allele frequency comparisons are presented in

Figure 3-2). Drug dependence was significantly associated with SNPs rs2583978, rs356186,

rs356198 and rs3775423 in the SW population (Figure 3-2). We tested whether this association

was gender-dependent. When we focused on males, the association with drug dependence

persisted for SNPs rs356186 and rs3775423 and SNPs rs3775439 and rs356165 also were

significantly associated (Figure 3-2). However, when we adjusted for multiple testing in order to

control for the family-wise type I error (FWER) by means of a permutation test with 10,000

replicates, the global P-value was 0.113 for the entire population and 0.112 when only males

where counted. None of the SNP associations were significant in females (Figure 3-2).

Stratification of the Plains population by gender revealed a significant association between

rs356163 and alcohol dependence and alcohol use disorders and between rs2572324 and alcohol

use disorders in males (Figure 3-2). However, the global COCAPHASE P value in Plains males

was not significant (p > 0.1 for both alcohol dependence and alcohol use disorders).

Association of the four addiction phenotypes was also tested against haplotypes

constructed across the entire gene using SNPs 3, 4, 8, 9, 12, and 13 as described above. Several

SNPs within the SNCA gene may be contributing to an addiction phenotype in the form of a

“super-allele” (Schaid 2004); therefore we combined the six markers into a haplotype to increase

the power of detecting an association. Four major haplotypes were detected in the SW and Plains

populations (>5 % frequency in all phenotype categories) (Table 3-2). No difference in haplotype

frequencies between cases and controls were detected in either population. A power calculation

using pooled haplotype frequency data from both populations indicated 83-100% power to detect

an association at an odds ratio of 2.0 (p = 0.05).

Categorical diagnoses of substance abuse may not adequately describe a disease that is

both heterogeneous and occurs on a continuum, and symptom counts may be used as a

quantitative variable to provide more information (Helzer et al. 2006). Thus, the total number of

affirmative responses on the SADS-L interview (n=18) and the eight diagnostic questions for

alcohol dependence were used as additional phenotypes. No significant association between the

symptom count and the four most common haplotypes was detected using an ANOVA in either

the SW or the Plains populations. When each of the six SNPs used to define the haplotypes was

tested individually against the symptom count only SNP rs356198 was marginally significant

(p=0.044) in the SW population; however, that significance disappeared after correction for

multiple testing. No comparisons with individual SNPs were significant in the Plains population.

Since recent reports have described an association between the NACP-REP1 repeat

polymorphism and the risk of alcohol dependence (Bonsch et al. 2005b), we tested this variant in

our populations. Distribution of the NACP-REP1 alleles in both populations is shown in Figure

3-3. In contrast to other studied populations, both American Indian populations exhibited

reduced variation at the REP1 locus with a very high frequency of the 267 bp repeat allele (80-

85%). The 267 bp allele was associated with virtually all haplotypes in both populations

suggesting that no linkage disequilibrium exists between the SNCA SNPs and the REP1 locus,

thus the REP1 locus was analyzed independently of the SNCA SNPs and haplotypes. No

association was found between the REP1 locus and any of the four addiction phenotypes in the

two populations (Figure 3-3). The same lack of association was found when each gender was

analyzed independently (data not shown).

Discussion

Several recent studies have suggested that α-synuclein may play a role in the development

and maintenance of certain addictive disorders (Liang et al. 2003; Mash et al. 2003; Kobayashi et

al. 2004; Bonsch et al. 2005b). We assayed 15 SNPs in the α-synuclein gene, SNCA, and one

upstream microsatellite repeat (NACP-REP1) in two American Indian populations with a high

lifetime prevalence of alcohol and drug dependence and abuse (Table 3-1). The assayed alleles

and constructed haplotypes were tested for association with one of four clinical phenotypes,

including alcohol dependence and alcohol use disorders (pooled diagnoses of alcohol

dependence plus abuse) and drug dependence and drug use disorders (pooled diagnoses of drug

dependence plus abuse), as well as two symptom count phenotypes (total number of questions

and eight questions diagnostic for alcohol dependence).

Single allele tests revealed significant associations between four SNPs and drug

dependence in the SW population. Two of those SNPs plus another two SNPs were found to be

associated with drug dependence in SW males only. In the Plains population, a significant

association was detected only in males with two SNPs and alcohol use disorders and one SNP

and alcohol dependence. An association with alcoholism in males is consistent with an

overrepresentation of alcohol dependence in Native American males (66-70%) compared to

females (30-53%) (Kinzie et al. 1992; Kunitz et al. 1999; Ehlers et al. 2004b; Gilder, Wall, and

Ehlers 2004). However, none of the global p-values, calculated to adjust for multiple testing,

reached the level of significance. Haplotype analyses did not reveal any association between

SNCA and substance abuse or dependence. Furthermore, when corrected for multiple

comparisons, no significant associations were detected when symptom counts were tested against

haplotypes or individual SNPs.

The polymorphic REP1 complex dinucleotide repeat is located ~10kb upstream of the

SNCA transcriptional start site and is polymorphic for alleles of length 265-273 (Chiba-Falek,

Touchman, and Nussbaum 2003). Longer REP1 alleles have been associated with increased

expression of the SNCA gene and loss of the REP1 repeat in vitro resulted in a 4-fold reduction

in expression of the SNCA gene (Touchman et al. 2001; Chiba-Falek, Touchman, and Nussbaum

2003). Several studies have correlated elevated levels of SNCA mRNA and protein with various

phenotypes related to alcohol dependence (Liang et al. 2003; Bonsch et al. 2004; Walker and

Grant 2006). In contrast, our results do not support an association between NACP-REP1 and

alcohol dependence/use disorders or drug dependence/use disorders in the American Indians

populations analyzed here. Lack of association could be explained by the low allelic variability

present at this locus in both populations and, in particular, lack of the longer alleles implicated in

the previous study (alleles 271 and 273, Bönsch et al. 2005a).

To our knowledge, this is the most exhaustive analysis performed to date of genetic

variation at the SNCA locus and possible association with risk of alcohol and drug addiction. The

study is strengthened by analysis of two different populations. Although several SNPs initially

returned significant p values, none of the results remained significant after correction for

multiple testing. It is possible that the low genetic variation in these American Indians may mask

a significant association, or actually contribute to SNCA being a non-factor in addiction

vulnerability in these particular populations. It is likely that a search for genes involved in

addiction disorders will be complicated by population-specific genetic effects, as well as varying

effects of social and environmental factors. Additionally, cannabis is the most frequently abused

drug in our populations in contrast to previous studies that found a correlation between SNCA

and cocaine or methamphetamine addiction. Alcohol craving was not examined in this study, and

thus its association with the SNCA gene in these populations cannot be ruled out. Nonetheless,

our results in two American Indian populations do not support a role for a genetic variant in the

SNCA gene that contributes to alcohol or drug addiction. These results do not preclude a role for

this gene, particularly in other populations exhibiting more diversity at NACP-REP1. Altered

methylation patterns in the SNCA promoter have already been associated with alcoholism and

may contribute to differential expression levels of this gene (Bonsch et al. 2005c). Thus, future

research may focus on additional variants in the promoter region of SNCA that could cause the

changes in mRNA and protein levels observed in previous studies.

Table 3-1. Demographic and phenotypic characterstics of southwest (SW) and plains populations.

SW population n=514a Plain population n=420a

Age ± SD (range) 36.5 ± 13.6 (21–90) 42 ± 14.1 (18–87) Gender (% males) 42.8 43.6

Alcohol dependence 316 239 Alcohol use disorder 348 248

Drug dependence 44 (47% cannabis, 49% amphetamines, 9% cocaine)

49 (69% cannabis, 36% amphetamines, 15% cocaine)

Drug use disorder 185 (65% cannabis, 28% amphetamines, 34% cocaine)

96 (75% cannabis, 40% amphetamines, 17% cocaine)

Controlsb 135 159 aPhenotype counts do not equal population sample sizes because individuals had multiple diagnoses. For example, under drug dependence, 42 SW and 46 Plains individuals also had alcohol use disorders and, under drug use disorder, 170 SW and 88 Plains individuals also had alcohol use disorders. Furthermore, percentages of abused drugs do not sum to 100 because several individuals had multiple drug dependence. bControls are defined as those individuals with no diagnosis of alcohol dependence, alcohol use disorder, drug dependence, or drug use disorder.

Table 3-2. Frequency of Major Haplotypes in Cases and Controls in Southwest and Plains Populations. Population Southwest Plains Phenotype AD AUD DD DUD Controla AD AUD DD DUD Controla

Nb 312 344 43 184 130 224 233 47 91 147 221212c 0.412 0.411 0.337 0.416 0.4 0.281 0.281 0.328 0.258 0.265 222121c 0.24 0.23 0.302 0.217 0.208 0.337 0.339 0.34 0.324 0.313 111212c 0.164 0.166 0.186 0.152 0.204 0.152 0.15 0.192 0.154 0.119 111221c 0.08 0.076 0.081 0.092 0.077 0.132 0.129 0.075 0.126 0.16 % total haplotypes

0.896 0.883 0.896 0.877 0.889 0.902 0.899 0.935 0.862 0.857

aControls are defined as those individuals with no diagnosis of alcohol dependence, alcohol use disorder, drug dependence, or drug use disorder. bNumber of individuals in each category differs slightly from Table 1 because only individuals typed for a majority of the 6 single-nucleotide polymorphism (SNPs) were included. cHaplotypes were constructed from the following SNPs (in order): rs2737020, rs1812923, rs356164, rs356198, rs3775423, rs10033209. AD = alcohol dependence, AUD = alcohol use disorder, DD = drug dependence, DUD = drug use disorder.

109,8 Kb

5’ 3’

109,8 Kb

5’ 3’

Figure 3-1. Relative positions of single nucleotide polymorphisms assessed in α-synuclein

(SNCA) gene.

Single-nucleotide polymorphisms (SNP) rs920624 is not included because there are no associated frequency data in the HapMap database. The gene structure of SNCA is shown (top), with vertical bars indicating exons. Linkage disequilibrium (LD) structure is presented for the CEU population allele frequencies from the HapMap project (top), the Southwest population (middle), and the Plains population (bottom). Numbers within the diamonds are D' values for the respective SNP pairs. Solid red diamonds represent absolute LD (D'=1), blue diamonds represent strong LD with low level of significance. Numbers in gray within white diamonds represent a high probability or evidence of historical recombination. Haplotype blocks, as determined with the use of Haploview software, are depicted.

Southwest

Alc-dep Alc-abu Drug-dep Drug-abu

Southwest males

Southwest females

luesPlains

Plains males

Plains females

Alc-dep Alc-use-dis Drug-dep Drug-use-dis Alc-dep Alc-use-dis Drug-dep Drug-use-dis

Southwest

Southwest males

Southwest females

luesPlains

Plains males

Plains females

Alc-dep Alc-use-dis Drug-dep Drug-use-dis Alc-dep Alc-use-dis Drug-dep Drug-use-dis

Figure 3-2. Single-marker analyses representing p values for each marker on a logarithmic scale.

A dotted line indicating a p value of 0.05 is represented in each graph, so values above correspond to significant associations. Graphs on the left refer to the Southwest population whereas graphs on the right refer to the Plains population. The first row represents the entire data set for each population, while the second row refers to males only and the third row refers to females only.

Figure 3-3. Allelic distribution of the NACP-REP1 microsatellite repeats.

Shown for the Southwest and Plains populations. Allele size is depicted on the X-axis while allele frequency is depicted on the Y-axis.

CHAPTER 4 LACK OF ASSOCIATION BETWEEN ADH/ALDH MARKERS AND SUBSTANCE USE

DISORDER IN NATIVE AMERICAN POPULATION

Introduction

Native Americans had the highest rate of alcohol related deaths of all ethnic groups in the

United States in 2001 (age-adjusted death rate=42.1/100,000) which was more than five times

higher than the alcohol-related death rate for the general US population (6.9/100,000) (Health,

United States, 2006, http://info.ihs.gov/Files/DisparitiesFacts-Jan2006.pdf ). Native Americans

were also 2.5 times as likely to die of chronic liver disease or cirrhosis than Caucasians

(22.7/100,000 vs. 9.2/100,000) (Facts on Indian Health Disparities,

http://www.cdc.gov/nchs/data/hus/hus06.pdf#031). In addition, this group also had the highest

frequency of current drinkers who reported drinking more than four drinks in one day in the past

year compared to other ethnic groups (40.9% vs. 32.7% for Caucasians, 24.3% for African-

Americans, and 20.8% for Asians) (Facts on Indian Health Disparities,

http://www.cdc.gov/nchs/data/hus/hus06.pdf#031). It is unclear to what extent this is a result of

genetic determinants or cultural influences such as poverty and lack of health care on

reservations. Many studies have investigated the role of genes in substance abuse, and multiple

studies have provided evidence for a locus contributing to alcohol dependence on chromosome 4

(Long et al. 1998; Reich et al. 1998; Williams et al. 1999; Zinn-Justin and Abel 1999; Mulligan

et al. 2003; Ehlers et al. 2004a). Several candidate genes proximally located on that chromosome

include alpha synuclein (Bonsch et al. 2004; Bonsch et al. 2005a; Bonsch et al. 2005b; Bonsch et

al. 2005c; Clarimon et al. 2007) the GABA receptors (Long et al. 1998) and the alcohol

dehydrogenase gene family (ADH) (Chen et al. 1996; Osier et al. 1999; Mulligan et al. 2003).

The seven ADH genes encode enzymes that convert alcohol to acetaldehyde, which is then

processed into acetate by enzymes encoded by the ALDH genes. The ADH genes are located in a

cluster (ADH7-ADH1C-ADH1B-ADH1CA-ADH6-ADH4-ADH5) on chromosome 4 (Osier et al.

2002b). Certain variants within the ADH and ALDH genes have been found to protect against

alcoholism; in particular, the ADH1B*47His allele (and possibly the ADH1B*369Cys allele)

results in increased blood levels of acetaldehyde leading to an unpleasant flushing response that

is proposed to have a protective effect against alcoholism (Thomasson et al. 1991; Goedde et al.

1992; Thomasson et al. 1993; Nakamura et al. 1996). However, the ADH1B*47His allele is only

present at polymorphic frequencies in Asian and Jewish populations (Chao et al. 1994;

Thomasson et al. 1994; Chen et al. 1996; Nakamura et al. 1996; Tanaka et al. 1996; Shen et al.

1997; Neumark et al. 1998; Osier et al. 1999; Ehlers et al. 2004a), although it has been found at

low levels in European, African, and Middle Eastern populations as well (Borras et al. 2000;

Ehlers et al. 2001; Osier et al. 2002b). A protective effect has also been found for the

ADH1C*349Ile allele in Asians, Native Americans and Europeans (Chao et al. 1994; Chen et al.

1996; Shen et al. 1997; Borras et al. 2000; Konishi et al. 2003; Mulligan et al. 2003), although

this may be a result of linkage disequilibrium with ADH1B*47His in Asian populations that

carry this allele (Chen et al. 1999; Osier et al. 1999). An additional variant in the mitochondrial

ALDH2 gene, ALDH2-2, blocks the conversion of actetaldehyde to acetate resulting in an

accumulation of acetaldehyde and a more pronounced flushing response than the ADH1B*47His

allele (Harada, Agarwal, and Goedde 1982; Thomasson et al. 1991; Goedde et al. 1992;

Thomasson et al. 1993; Novoradovsky et al. 1995; Peterson et al. 1999). The allele has been

found to be protective in Asian populations (Iwahashi et al. 1995; Chen et al. 1996; Chen et al.

1999; Hara, Terasaki, and Okubo 2000; McCarthy et al. 2000). In addition, the combination of

the ADH1B*47His and the ALDH2-2*487Glu alleles have been found to drastically increase the

risk of alcoholism in Koreans (Kim et al. 2008).

Although ADH1B*47His and ALDH2-2 are absent in most Native American populations,

several lines of evidence suggest that genetic variants in or near the ADH genes contribute to

alcohol dependence in Native Americans. First, an autosomal genome scan in a Native American

population identified this region of chromosome 4 as a possible alcoholism risk locus (Long et

al. 1998). Subsequent studies have identified additional polymorphisms within the ADH genes

that are present in Native Americans (Wall et al. 1997; Ehlers et al. 1998; Osier et al. 2002a).

Two ADH1C variants (including ADH1C*349Ile) and a neighboring microsatellite marker were

associated with alcohol dependence in a subset of individuals from a Southwest Native American

population (Mulligan et al. 2003). In order to determine if these alleles contribute to risk of

substance abuse in a Native American Plains population, we assayed nine single nucleotide

polymorphisms (SNPs) across the ADH1A, ADH1B, and ADH1C genes and three markers in the

ALDH gene in 359 members of a Plains Native American population. Two different diagnoses

for alcohol use were investigated (alcohol dependence and abuse) as well as two continuous

measures of alcohol use based on survey responses. The survey consisted of 18 questions

concerning the impact of alcohol use on daily life (see Appendix A). The number of positive

responses to both the full 18 questions and a subset of eight diagnostic questions was used as the

outcome variable in a regression analysis. We also included two measures of drug use (drug

dependence and abuse) as a recent study found an association between drug dependence and

ADH variants independent of alcohol behavior (Luo et al. 2007).

Samples

The sampling methodology was previously described in Mulligan et al. (2003) and is

briefly summarized here. Blood samples and clinical data were initially collected from 420 adult

members of a Plains Native American population. Ultimately, a subset of 359 individuals that

had full diagnostic information was selected for genotyping. Informed consent was obtained

under a human subjects research protocol approved by the Plains Tribal Council and the

Institutional Review Board (IRB) of the National Institute on Alcohol Abuse and Alcoholism.

Testing Instruments, Interviews, and Psychiatric Diagnoses

Focus groups comprised of tribal staff and community members reviewed testing

instruments and questionnaires for potential cultural biases and general suitability to the

population. Research diagnoses for lifetime alcohol dependence and abuse and lifetime drug

dependence and abuse were based on: (1) semi-structured psychiatric interviews using the

Schedule for Affective Disorders and Schizophrenia—Lifetime Version (SADS-L) with

additional information to enable diagnoses using both the Research Diagnostic Criteria and

Diagnostic and Statistical Manual of Mental Disorders, Third Edition, Revised (DSM-III-R,

American Psychiatric Association, 1987), (2) medical, educational, court, and other records, and

(3) corroborative information from family members. The SADS-L was administered to all

subjects by a psychologist experienced with psychiatric assessment in this tribe and other

American Indian populations. Sampling strategy, interview procedure, and diagnosis protocol

are summarized from (Long et al. 1998; Robin et al. 1998; Mulligan et al. 2003; Belfer et al.

2006).

Genotyping

Three hundred and fifty-nine individuals were typed for five ADH1A, three ADH1B, one

ADH1C, and three ALDH polymorphisms. Loci, primers, thermocycling conditions, and

restriction enzymes are listed in Table 1. PCR amplicons were analyzed by electrophoresis on

2% agarose or 4% Metaphor (FMC BioProducts, Rockland, Me.) gels. Marker ADH1Ain8 BccI

was analyzed on a Beckman Coulter CEQ 8000 using the Beckman SNP genotyping kit

according to the manufacturer’s instructions (Beckman Inc, Fullerton CA).

Statistical Analysis

Haploview (Barrett et al. 2005) was used to visualize linkage disequilibrium (LD)

relationships between the assayed ADH markers, and linkage disequilibrium blocks were

constructed following the D' method by (Gabriel et al. 2002). PHASE ver. 2.1 (Stephens and

Donnelly 2003) was used to reconstruct haplotypes from unphased population genotypic data.

The program R (R Foundation for Statistical Computing, Vienna, Austria) was used to

implement tests for Hardy-Weinberg equilibrium.

A total of four clinical phenotypes were tested: alcohol dependence, alcohol use disorder

(diagnoses of alcohol dependence and alcohol abuse were pooled), drug dependence, and drug

use disorder (diagnoses of drug dependence and drug abuse were pooled). The phenotypic

information is summarized in Table 4-2 (categories were not mutually exclusive). Six continuous

variable measures of substance use were also tested for association with the genetic data. Two

measures of symptom count data were tested, which were calculated as (1) the total number of

affirmative responses to the SADS-L interview questions (Endicott and Spitzer 1978) and (2) the

total number of affirmative responses to eight questions used in the diagnosis of alcohol

dependence (see Appendix 1 for questions). The smaller subset of questions captured the most

severe aspects of alcohol dependence and was tested separately to determine association with

genetic markers among the most afflicted individuals. The other four phenotypes included

maximum number of drinks ever consumed in one day, maximum drinks ever consumed in one

month, age at which regular drinking began, and age at which heavy drinking began.

Genotype and allele frequencies for each marker were compared for each of the four

diagnostic groups with the frequencies in the control group. The chi-square test was used for all

comparisons as implemented in the program R. For each analysis, males and females were tested

separately, then pooled together. Genotype frequencies for each marker were tested for

association with the six continuous phenotypes. These data were analyzed in SAS v.9.1 (SAS

Institute Inc, Cary NC) using a regression model that incorporated sex and age as well as

genotype information. A subset of individuals (n=325) were used in the regression analysis due

to missing phenotype information for some individuals.

In addition, frequencies for the haplotypes identified through the statistical analysis

described above were compared for all cases (individuals with any of the four diagnoses) and

controls, defined as individuals without any of the four diagnoses. Haplotype association tests

incorporate information from multiple sites and possibly are more powerful than single-marker

tests (Shaid et al 2004).

Results

Three hundred and fifty seven individuals were genotyped, of which 212 were females and

142 were males. Of the nine ADH markers, four (ADH1BArg47His, ADH1B RsaI,

ADH1BArg369Cys, and ADH1Ain8 BccI ) were not polymorphic in this population and were

thus not informative for our analyses. All markers were in HW equilibrium (data not shown).

Among the five polymorphic ADH markers, 98% linkage disequilibrium (LD) was determined

by Haploview analysis (Figure 4-1). Because of the very high level of LD, only three major

haplotypes were present in the population and represented 93% of the diversity. The frequencies

of these haplotypes were not significantly different between cases and controls (Table 4-3).

The genotype, allelle and haplotype frequencies of each of the seven polymorphic markers

were tested for association with the four discrete phenotypes (alcohol dependence, alcohol use

disorder, drug dependence, and drug use disorder) and the control individuals (Table 4-4). A

marginally significant association (p=0.0421) was found between drug dependence and genotype

at marker ADH1C EcoRI (Table 4-4). However, the allele comparison at this locus was not

significant, and after Bonferroni correction for multiple testing, the p-value for the genotype

association was no longer significant (for each phenotype, adjusted alpha = 0.05/ number of

comparisons = 0.0035). When males and females were tested separately for association with the

four discrete diagnoses and the genotype data, males had lower p-values in general but none

were significant after correction for multiple testing (data not shown).

A regression analysis was also performed using six continuous phenotypic categories

(total number of affirmative answers on the symptom count questionnaire, number of affirmative

responses for a reduced set of eight questions on the same questionnaire, maximum number of

drinks ever consumed in one day, maximum drinks ever consumed in one month, age at which

regular drinking began, and age at which heavy drinking began) and the genotype for each of the

seven markers along with sex and age as additional predictor variables (Table 4-4). For both

symptom count variables, only sex was consistently significant after correction for multiple

testing (alpha=0.002). For both maximum number of drinks variables, again only sex was

consistently significant. For the categories related to age/years of drinking, age was significantly

associated, and sex was only occasionally significantly associated with these variables. The

genotype data were not significantly associated with any of the continuous phenotype variables.

Discussion

The ADH and ALDH genes have been extensively studied in many ethnically diverse

populations, and the strongest associations with substance abuse have been found in Asian

populations that carry the ADH1B*47His and ALDH2-2 alleles known to protect against

alcoholism through toxic accumulation of acetaldehyde (Chao et al. 1994; Thomasson et al.

1994; Chen et al. 1996; Nakamura et al. 1996; Tanaka et al. 1996; Shen et al. 1997; Osier et al.

1999). A previous study found significant association between alcohol dependence and two

ADH1C markers (ADH1CHaeIII and ADHCIle349Val) , between binge drinking and three

markers (ADH1C EcoRI, ADH1C HaeIII, and ADHC Ile349Val) and between flushing and one

marker (ALDH2-In6A) in a subset of individuals in a Southwest Native American tribe (Mulligan

et al. 2003). Thus, we sought to replicate the study in a different Native American population in

order to strengthen support for ADH as a risk locus. We used a continuous measure of

alcoholism and information about behavior in a regression model in addition to traditional

dichotomous diagnoses, since considering substance abuse as a continuum might enable us to

detect associations that would be masked by broad diagnoses which include individuals who

abuse substances for social or cultural reasons.

Three hundred and fifty-nine individuals were assayed for nine markers from the ADH1A,

ADH1B, and ADH1C genes and three from the ALDH gene. As previously reported (Mulligan et

al. 2003), neither of the protective alleles (ADH1B*47His and ALDH2-2) were detected in this

population. In contrast to the previous study (Mulligan et al. 2003), neither of the significantly

associated markers in the Southwest population were found to be significantly associated with

any of the tested phenotypes in this Plains population. Age and/or sex were significant in most of

the regression analyses, suggesting that the demographic information is strongly correlated with

substance abuse.

It is somewhat surprising that this study did not uncover any significant association

between the genetic data and substance use disorder phenotypes, despite the multiple ways in

which the genetic data were analyzed and several innovative measures of abuse. However, the

previous study investigating the association between the ADH/ALDH genes and alcohol use in a

Southwest population found only nominally significance and only in a subsets of the study

sample (Mulligan et al. 2003) In addition, a study use a whole-genome scan only found

significance for the ADH locus on chromosome 4 using two-point linkage analysis, but not

multipoint linkage analysis which takes into account all marker information from one

chromosome (Long et al. 1998). While these results were initially suggestive of ADH/ALDH

having genetic determinants linked to alcohol abuse and were the impetus for conducting a

similar study in a different Native American population, the results from this study suggest that

genetic variants in the ADH and ALDH genes may not be major determinants of substance abuse

disorders in Native Americans. There are several possible reasons for this result. First, during the

migration from Asia to the New World, a population bottleneck severely reduced the genetic

variability present in the ancestral population (Kolman et al. 1995; Kolman and Bermingham

1997; Ramachandran et al. 2005), which also resulted in higher LD at the ADH genes in Native

Americans than in Chinese or Africans (Mulligan et al. 2003). Variants conferring protection/risk

at this locus may therefore have been lost in all or some Native American populations. The fact

that modest association was found in the Southwest population is consistent with a recent study

that found higher diversity in western populations compared with eastern populations in South

American Native Americans, suggesting that more ancestral variants might have been retained in

western populations (Wang et al. 2007) This pattern could result from an initial coastal migration

from Asia, followed by expansion into the interior regions, as has been recently suggested

(Anderson and Gillam 2000; Dixon 2001; Fix 2002; Surovell 2003).

Another explanation is that different cultural factors may be influencing substance abuse in

the Plains population and may play a larger role than genetic factors. This is supported by the

demographic component consistently being the only significantly associated factor. The genetic

component of alcoholism in this case may be swamped by the high number of individuals who

exhibit substance abuse phenotypes, because many individuals in the affected category may

abuse substances for cultural or social reasons but do not have a genetic pre-disposition, despite

the fact that we attempted to use continuous measures of abuse to capture degrees of severity.

Native Americans experience much poorer health in general than the rest of the population in the

U.S. Their infant death rate is almost double that of Caucasians, they have a 40% higher

prevalence of AIDS, and are more than twice as likely to be diagnosed with diabetes. These

statistics may be associated with poorer access to health care as well: 30% of Native Americans

had no health coverage in 2005, and 25% of this group lives at the poverty level (Office of

Minority Health, 2006). Therefore, it is likely that environmental factors contribute heavily to the

high prevalence of alcoholism in this Native American population, and suggest that resources

could productively be devoted to addressing the poverty and poor health care in this ethnic

group.

Table 4-1. Loci, primers, cycling conditions and restriction enzymes for 12 loci studied. Marker Primer Name Primer Sequence Thermocycling Conditions Restriction

Enzymes ADH1CEcoRI A3EX2DW

A3EcoUP2 5’-TTGCACCTCCTAAGGCTC-3’ 5’-TCTAATGCAAATTGATTGTGAAC-3’

94 °C (15 s), 51 °C (15 s), 72 °C (75 s); 40 cycles

ADH1CHaeIII A3EX5FOR2 A3EX5REV1

5’-TGAGTTTGCACATTAGTTATGG-3’ 5’-TGCTCTCAGTTCTTTCTGGG-3’

94 °C (40 s), 56 °C (30 s), 72 °C (60 s); 35 cycles

HaeIII

ADH1CArg271Gln A3EX6FXNFOR1 A3EX6FXNREV3

5’-TTGTTTATCTGTGATTTTTTTTGT-3’ 5’-CGTTACTGTAGAATACAAAGC-3’

94 °C (15 s), 54 °C (15 s), 72 °C (60 s); 35 cycles

ADHCIle349Val A3FXNFOR1 A3FXNREV3

5’-TTGTTTATCTGTGATTTTTTTTGT-3’ 5’-CGTTACTGTAGAATACAAAGC-3’

94 °C (15 s), 51 °C (15 s), 72 °C (75 s); 40 cycles

ADH1CPro351Thr ADH1CSNPFOR1 A3FXNREV3

5’-GTTTTCACTGGATGCACTAATAAC-3’ 5’-CGTTACTGTAGAATACAAAGC-3’

94 °C (30 s), 51 °C (30 s), 72 °C (75 s); 40 cycles

ADH1BArg47His A2FXNFOR A2FXNREV

5’-ATTCTAAATTGTTTAATTCAAGAAG-3’ 5’-ACTAACACAGAATTACTGGAC-3’

95 °C (30 s), 56 °C (30 s), 72 °C (60 s); 35 cycles

ADH1BRsaI A2IN3DW3 A2IN3UP2

5’-ATATTTATTTTACCCTAAACTTATG-3’ 5’-GAGCTAAAACATACTTTGGATAG-3’

94 °C (30 s), 60 °C (30 s), 72 °C (30 s); 35 cycles

ADH1BArg369Cys HE39 HE40

5’-TGGACTTCACAA CAAGCATGT-3’ 5’-TTGATAACATCTCTGAAGAGCTGA-3’

95°C (15 s), 58°C (15 s), 72°C (60 s); 35 cycles

ADH1Ain8BccI A1BccIDW A1IN8UPI A1BccITUP

5’-ATTGTCTAGCAGAAAATGAAAAG-3’ 5’-AGTTTCTTTCCCTCCTCAAGAATG-3’ 5’-TTTTTTTTTTTTCTAATTTTTCTCATCCTTCCA-3’

94 °C (15 s), 54 °C (15 s), 72 °C (60 s); 35 cycles

ALDH2.5' 5 .for 5 .rev

5’-GCAGTGCCGTCTGCCCCATCCATGT-3’ 5’-GGCCCGAGCCAGGGCGACCCTGAGCT-3’

94 °C (30 s), 60–62 °C (30 s), 72 °C (30 s); 40 cycles

ALDH2.In6A In6A.For In6A.Rev

5’-AAATATTGCTCTAGGCCAGGC-3’ 5’-TGGGAATTCTAAATGGGACGG-3’

94 °C for 10 cycles/89 °C for 30 cycles(30 s), 55 °C (30 s), 72 °C (30 s); 40 cycles

HaeIII

ALDH2.Def L12 R12

5’-TTTGGTGGCTAGAAGATGTC-3’ 5’-CACACTCACAGTTTTCTCTT-3’

94 °C (30 s), 57 °C (30 s), 72 °C (30 s); 40 cycles

Table 4-2. Phenotypic characteristics of the dataset. Phenotype Females Males Total Alcohol dependencea 101 105 206 Alcohol Abusea 106 111 217 Drug Dependencea 15 32 47 Drug Abusea 42 51 92 All Casesb 113 112 225 Controlb 99 30 129 Average Total Symptom Count (18)

5.5 9.6 7.1

Average Reduced Symptom Count (8)

3.3 5.2 4.1

aIndividuals can be part of multiple phenotypes, and therefore the totals do not sum to the toal number of individuals (359). bAll cases and controls sum to 359. Table 4-3. Haplotype frequencies and p-value for comparisons of cases vs. controls. Haplotype Frequency All

individuals Frequency Cases Frequency

Controls p-value

11212 0.51 0.49 0.55 0.173 12122 0.34 0.35 0.33 0.490 22121 0.08 0.08 0.05 0.138 For haplotype designations, the order of the markers is as follows: ADH1CEcoRI, ADH1CHaeIII, ADH1CArg271Gln, ADHCIle47Val, ADH1CPro351Thr. “1” refers to the presence of a restriction site, and “2” refers to the absence of a restriction site.

Table 4-4. Chi-squared and regression p-values for genotype and allele associations for each marker.

Marker P-value ADH1C EcoRI

ADH1C HaeIII

ADH1C Arg271Gln

ADHC Ile349Val

ADH1C Pro351Thr

ALDH 2.5'

ALDH 2.In6A

Frequency Major Allele 0.9 0.54 0.53 0.54 0.92 0.65 0.94 Dichotomous Diagnosesa

genotype 0.2725 0.1366 0.0914 0.1682 0.2062 0.2662 0.2775 Alcohol Dependence allele 0.19 0.168 0.204 0.195 0.091 0.146 0.184

genotype 0.2642 0.1005 0.0746 0.1377 0.2062 0.3864 0.2624 Alcohol Abuse allele 0.171 0.125 0.176 0.166 0.09 0.233 0.177

genotype 0.0421 0.1136 0.05997 0.0954 0.2436 0.2352 0.7812 Drug Dependence allele 0.588 0.565 0.44 0.523 0.365 0.107 0.567

genotype 0.1525 0.2407 0.2702 0.2966 0.1319 0.341 0.3318 Drug Abuse allele 0.121 0.188 0.417 0.327 0.257 0.174 0.142 Continuous Diagnosesb

genotype 0.0989 0.1485 0.136 0.1873 0.0265 0.2377 0.3569 age 0.9461 0.8225 0.7142 0.7464 0.9693 0.9464 0.9952 Symptom

Count (18) sex <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 genotype 0.0868 0.3751 0.267 0.3551 0.0304 0.1151 0.5301 age 0.396 0.3553 0.2432 0.257 0.4223 0.352 0.4559 Symptom

Count (8) sex <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 genotype 0.5815 0.7387 0.9867 0.7225 0.6649 0.835 0.5663 age 0.354 0.3983 0.4531 0.4677 0.2355 0.3546 0.1893 Max

Drinks/month sex 0.001 0.0009 0.0028 0.0017 0.0006 0.0013 0.0006 genotype 0.5815 0.6398 0.6768 0.6014 0.0839 0.7642 0.3218 age 0.354 0.1074 0.1322 0.0736 0.1048 0.117 0.0591 Max

Drinks/Day sex 0.001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 genotype <0.0001 0.5548 0.515 0.6154 0.0709 0.62 0.9289 age <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 Years of Heavy

Drinking sex 0.0215 0.0448 0.0208 0.0235 0.0434 0.0414 0.049 genotype 0.0403 0.2558 0.1969 0.2071 0.0482 0.9888 0.7319 age 0.0004 0.0004 0.0007 0.0006 0.0003 0.001 0.0008 Start Heavy

Drinking sex 0.0542 0.0819 0.0625 0.0823 0.0631 0.082 0.0709 genotype 0.1465 0.7951 0.7094 0.7449 0.2044 0.9803 0.1766 age 0.0056 0.006 0.0288 0.0308 0.0055 0.01 0.0059 Start regular

drinking sex 0.0006 0.0017 0.0019 0.0018 0.0009 0.0019 0.0009 aAll analyses were performed using the chi-squared test. bThese analyses were performed using a regression model with each marker separately, and age and sex included in the model. Values in bold indicate significance after correction for multiple testing.

Figure 4-1. Linkage disequilbrium of markers assessed in the ADH gene family.

Linkage disequilibrium (LD) structure is presented for the Plains population. Numbers above the red blocks indicate polymorphic assayed markers in ADH1A, B, and C: 1=ADH1CEcoRI, 2=ADH1CHaeIII, 3=ADH1CArg271Gln, 4=ADHCIle47Val, 5=ADH1CPro351Thr. Numbers within the diamonds are D' values for the respective marker pairs. Solid red diamonds represent absolute LD (D'=1). One haplotype blocks, as determined with the use of Haploview software, was determined.

CHAPTER 5 DYNAMIC AND DISTINCT EVOLUTION OF HIV-1 IN BREASTMILK OVER TWO

YEARS POST-PARTUM

Introduction

As of 2007, 33.2 million people were living with the human immuno-deficiency virus type

1 (HIV-1) worldwide. 22.5 million of those people (68%) live in sub-Saharan Africa. World-

wide, an estimated 420,000 children were infected in 2007, the vast majority through mother-to-

child-transmission (MTCT) of HIV-1 (WHO 2007). Breast-feeding accounts for one-third to

one-half of all MTCT events over 24 months (Dabis et al. 1999; Iliff et al. 2005). In the US,

women are counseled by the CDC to replace breastfeeding with formula if infected with HIV-1

(CDC 2007), which (along with anti-retroviral drugs and cesarean-sections) resulted in only

~300 infants becoming infected perinatally in 2000 (CDC 2006b). Because of the seemingly

obvious reduction of transmission with formula feeding, as demonstrated in studies in developed

countries, the World Health Organization recommended that “when replacement feeding is

acceptable, feasible, affordable, sustainable, and safe, avoidance of all breastfeeding by HIV-

infected mothers is recommended” (WHO 2003). However, formula-feeding is impractical for

women in resource poor regions of the world where they do not have consistent access to clean

water, formula, and health care, and breast feeding may be the only practical option. Cultural

pressures also make women reluctant to eschew breastfeeding as this can be seen as a tacit

admission of HIV-1 status. The result was an ineffectual policy because women were essentially

left with no practical guidelines for reducing the risk of transmission. This situation highlights

the critical need for an anthropological perspective in international health policy, which is often

based on ideal conditions and Western epidemiology.

A shift in the traditional thinking about breastfeeding was influenced by the supposition

that breastfeeding by infected mothers actually reduced infant mortality (Ross and Labbok

2004). Recent observational studies have suggested that exclusive breastfeeding, as opposed to

the simultaneous feeding of milk and other foods, may significantly reduce the risk of

transmission of HIV-1 (Coutsoudis 2000; Coutsoudis et al. 2001; Coutsoudis et al. 2002; Iliff et

al. 2005; Kuhn et al. 2007). The WHO subsequently changed its recommendations to women in

developing countries to state that “breastfeeding is preferable to artificial feeding in the first six

months of life, regardless of the mother’s HIV status, as replacement feeding poses a greater risk

of death to the infant than breastfeeding from an HIV-infected mother in first months. HIV-

infected mothers are advised to wean their infants early to avoid prolonged exposure of the infant

and to avoid combining breastfeeding with replacement feeding, which appears to heighten the

risk of transmission” (WHO 2006). While this revision addresses some of the cultural barriers

presented by the former recommendation, there are still problematic aspects. The rapid weaning

advocated by the WHO is often painful for the mother and stressful for the infant, often leading

to a relapse in breastfeeding after the introduction of other foods. Furthermore, in many African

countries the typical duration of breastfeeding lasts well after the infant’s first year. Therefore,

the recommendation to wean early still presents the mother with the challenges of finding and

affording formula, clean water, and social stigmas. Furthermore, the biological mechanisms

underlying the reduction of risk through exclusive breastfeeding have not been clearly

elucidated, and the optimal duration of breastfeeding is still debated. Several studies suggest that

the risk of transmission actually declines over time, and that exclusive breastfeeding is only

necessary for the first four months (Kuhn et al. 2007). If this can be bourne out, women would be

able to avoid the painful abrupt weaning process and the problems associated with complete

replacement feeding without undue risk of transmission, and would be a more culturally

appropriate recommendation. The benefit of the anthropological genetic perspective which I

bring to this study is the ability to use evolutionary analyses to investigate the molecular basis of

modulated risk of MTCT, with the goal in mind of advocating a scientifically sound and

culturally sensitive breastfeeding management plan to women while eliminating unnecessarily

onerous measures.

Background

Human Immunodeficiency Virus Type 1 Infection

There are two major types of HIV: HIV-1 and HIV-2. HIV-1 is categorized into three main

subtypes, M, N, and O, which derive from separate zoonotic events in which the simian immuno-

deficiency virus (SIV) was transmitted from Pan troglodytes troglodytes (chimpanzee) (Gao et

al. 1999). Group O viruses have only been found in people living or having contact with central

Africa (mainly Cameroon and some neighbouring countries) (Gurtler et al. 1994; Loussert-Ajaka

et al. 1995), and group N viruses have only been reported from Cameroon (Simon et al. 1998).

Within group M, eleven major subtypes of HIV-1 (A-D, F-H, J-K) and at least 16 published

circulating recombinant forms (CRFs) make up the majority of infections worldwide (Leitner et

al. 2005). Subtypes are specific to geographic regions reflecting the initial routes and modes of

transmission. In the United States, Western Europe, and Australia, subtype B is found almost

exclusively, while in sub-Saharan Africa and India, subtype C is most prevalent. Subtypes can be

up to 25% different in nucleotide diversity (Perrin, Kaiser, and Yerly 2003).

The HIV-1 genome is ~10,000 bp in size and codes for the gag, pol, and env genes

characteristic of all retroviruses (Steffy and Wong-Staal 1991). Env encodes for the

glycoproteins (gp)120 and gp41, which are located on the surface of the lipid membrane

surrounding the viral particle. Gp120 binds to the host cell-surface molecule CD4, which is

expressed primarily on the surface of T-lymphocytes and macrophrages, i.e. leukocytes critical

to the immune response to infection (Maddon et al. 1986). HIV-1 also requires a co-receptor to

enter the host cell, principally either the CCR5 or CXCR4 chemokine receptors, which are

differentially expressed on subsets of macrophages and lymphocytes (Broder and Collman

1997). CCR5 is mainly expressed on macrophages and antigen-primed memory T-cells, while

CXCR4 is expressed on unactivated naïve T-cells and some macrophages. Different HIV viruses

differ in their ability to use each co-receptor, and are categorized as an R5 virus (uses the CCR5

co-receptor), X4 virus (uses the CXCR4 co-receptor), or R5X4 (uses both) (Berger et al. 1998;

Garzino-Demo et al. 2000). Typically, only R5 viruses are found early in infection, while X4

viruses emerge later in the majority of infected individuals (Bjorndal et al. 1997; Connor et al.

1997), and appear to evolve from earlier R5 viruses (Salemi et al. 2007).

After attachment to CD4 and the co-receptor, the viral lipid envelope fuses with the target

cell lipid membrane, which allows the viral core to enter the cell. The core is comprised of the

double stranded RNA, proteins, and enzymes. After entry, the viral RNA is reverse-transcribed

into double-stranded DNA in the cytoplasm and transported to the nucleus as a pre-integration

complex. The viral integrase enzyme integrates the viral DNA through linkage between the long

terminal repeats at each end of the viral DNA and in the host DNA. The viral DNA is then

transcribed into viral RNA, which either remains intact or else is spliced and transported to the

cytoplasm for translation into regulatory and polyproteins. These proteins along with the full-

length RNA transcripts are packaged into viral particles that bud from the surface of the infected

cells and enveloped by the host cell membrane. The viral particle must undergo a final

maturation step in which the gag polyproteins are cleaved by viral protease (Goodenow et al.

2003).

Env gp120 is comprised of five conserved regions (C1-C5) and five variable regions (V1-

V5). The determinants for co-receptor use are localized to the V3 loop of gp120 (Carrillo and

Ratner 1996; Hung, Vander Heyden, and Ratner 1999). In subtype B viruses, the net charge of

the amino acids in the V3 loop in gp120 is predictive of co-receptor usage (Briggs et al. 2000).

However, in other subtypes, the correlation may not be as strong. Env can differ by up to 5%

within an individual (Lamers et al. 1993), and can differ up to 30% among subtypes. N-linked

glycosylation sites are prevalent in gp120, and are involved in the structure and folding of the

protein. The glycosylation sites can also form a glycan shield to block neutralizing antibody

response (Wei et al. 2003). The V1 and V2 regions display considerable diversity in terms of

number of glycosylation sites, length, and amino acid variation over the course of infection in

one individual (Hughes, Bell, and Simmonds 1997a; Klevytska et al. 2002; Kitrinos et al. 2003;

Nabatov et al. 2004; Ritola et al. 2004; Sagar et al. 2006). Furthermore, functional (Pastore et al.

2006) and phylogenetic (Salemi et al. 2007) studies suggest that X4 sequences evolve according

to a consistent program of development that is recapitulated in individual patients and that

requires compensatory mutations in V1V2 to occur prior to emergence of high charge V3

domains. Thus, analyzing the diversity in gp120 provides both phylogenetic signal as well as

identification of potentially relevant functional changes over time.

Stages of Breastmilk Production

The initial HIV-1 infection in the breastmilk is poorly understood. Breastmilk is a complex

composition of cells, proteins, water, ions, fats, vitamins and minerals. Secretory alveolar

epithelial cells in the mammary gland surround multiple lumina, which are storage chambers for

the milk. Lactogenesis I refers to the onset of secretion of milk components to form colostrum

(i.e. the early milk), which occurs at 16-24 weeks of pregancy (Arthur, Smith, and Hartmann

1989). During this phase, the epithelial cells of the alveoli differentiate into secretory cells,

which then begin to synthesize lactose, casein, and milk fat triglycerides, and are secreted into

the lumen. The alveolar epithelial cells also extract water, vitamins, and minerals from the blood

capillaries surrounding the milk duct. During this stage, the epithelial cells are not tightly joined,

which allows elements from the blood such as immune cells and plasma protein to directly enter

the lumen via the paracellular pathway (Neville and Neifert 1983), as well as HIV-1 infected

cells or viral particles from the plasma. Colostrum contains higher concentrations of sodium,

chlorine, and proteins than mature milk (Georgeson and Filteau 2000), as well as a higher

proportion of leukocytes (white blood cells) (Goldman 1993), and is optimized for the infant’s

requirements in the first days after birth. All of the milk components remain in the lumina until

the suckling infant stimulates activation of the hormone oxytocin, which in turn causes the

contraction if the alveolar cells and the flow of the milk through the duct system (Fuchs 1991).

Lactogenesis II is defined as the initiation of large amounts of milk which is triggered by a

decrease in progesterone a few days post-partum (Neville et al. 1991). Biochemical changes

include increase in lactose and glucose and a decrease in sodium and protein resulting from the

closure of tight junctions between the epithelial cells. This closure is reversed during weaning

when the milk volume falls to <400 ml/day which corresponds to <2 feedings per day and

corresponds with significant increases in sodium, chloride, and protein and a decrease in lactose

(Neville et al. 1991). Inflammation of the breast (mastitis) also causes the opening of the

paracellular pathway, and increased sodium and albumin levels are associated with mastitis

(Shuster, Kehrli, and Baumrucker 1995; Semba et al. 1999b; Becquart et al. 2000; Rollins et al.

2001). The increased HIV-1 loads associated with both weaning (Thea et al. 2006) and mastitis

have been suggested to result from these “leaky ducts” in which cell associated (viral DNA) and

cell-free (viral RNA) virus can more efficient transfer from the plasma to the milk (Kuhn et al.

2007).

Cellular Composition of Breastmilk

The relative composition of cells in the breastmilk has not been consistently reported, due

to the changing composition of the milk over time, the storage conditions of the expressed milk,

and the use of different measurement techniques (Kourtis et al. 2003). Early studies suggested

that macrophages predominate both in the colostrum (Crago et al. 1979) and in mature

breastmilk (Ho, Wong, and Lawton 1979; Pitt 1979). However, later studies suggest that

polymorphonuclear cells (neutrophils) comprise 80% of all cell types, followed by macrophages

(15%) and lymphocytes (5-10%), most of which are T-lymphocytes (Goldman, Chheda, and

Garofalo 1998). The leukocyte concentration is highest during early lactation and decreases 5-10

fold by the end of the first week post-partum (Goldman 1993; Georgeson and Filteau 2000).

Since macrophages and T-lymphocytes are the primary cells infected by HIV-1, the high

frequency of these cells in the breast milk represent a large target cell population.

The phenotype and functional characteristics of milk T cells are different than peripheral

blood T cells (Bertotto et al. 1990b). Almost all of the breastmilk T-cells are memory cells, as

evidenced by high expression of activation markers such as HLA-DR, CD25, and CD45RO

(Bertotto et al. 1990b; Wirt et al. 1992; Rivas, el-Mohandes, and Katona 1994; Kourtis et al.

2003). A large percentage of T cells express mucosal homing markers such as CD49f, alpha4-

beta7 integrin, and CD103+, suggesting that the T cells found in milk migrate from other tissues

in the body where they were originally activated (Bertotto et al. 1990b; Kourtis et al. 2003). A

possible source for the T-cells is the gut associated lymphoid tissue (GALT) (Manning and

Parmely 1980; Kourtis et al. 2003), which is supported by the high frequency of cells carrying

the gamma/delta T-cell receptor in both the GALT (Ullrich et al. 1990) and in the breast milk,

but not the plasma (Bertotto et al. 1990a). The GALT contains the majority of the T-cells in the

human body, and during initial HIV infection experiences almost a complete depletion of T-cells

due to high levels of infection (Douek 2007b; Douek 2007a). If the milk is being populated by

GALT-derived T-cells, many of the cells are likely infected with HIV-1 and are the source of

infection.

Macrophages in the breastmilk also have distinct characteristics from those in the blood.

For example, they spontaneously produce granulocyte monocyte colony stimulating factor (GM-

CSF) and can differentiate into CD1+ dendritic cells in the presence of interluken-4 (IL4) alone,

in contrast with monocytes in the peripheral blood mononuclear cells (PBMC), which require

GM-CSF and IL4 (Ichikawa et al. 2003). IL-4 stimulated breastmilk macrophages express DC-

SIGN, a dendritic cell-specific receptor for HIV-1. During mastitis, IL-4 is locally produced,

which may then up-regulate DC-SIGN expression, which could lead to increased HIV-1

infection of macrophages (Ichikawa et al. 2003). This is an alternative explanation for the high

levels of viral load in the breastmilk associated with mastitis, rather than the leaky duct

hypothesis.

Epithelial cells from the mammary gland itself were reported present in mature milk at low

levels (Xanthou 1997) although another study found that almost 80% of all cells were epithelial

(Petitjean et al. 2007). Mammary epithelial cells can become productively infected with HIV-1

(Toniolo et al. 1995), and therefore the infection in the breastmilk could originate from either

infected epithelial cells shedding into the lumina, or else cell-free viral particles produced by the

epithelial cells pass into the milk and infect T-cells and macrophages already there.

Compartmentalization of Breastmilk Virus

As discussed above, the origin of the virus in breastmilk is unclear, and the molecular

genetic evolution of the virus in breastmilk over time has not been extensively investigated. One

important question that remains to be answered is whether the virus in the breastmilk

compartmentalizes after the initial infection, i.e, forms a separate population initially due to

restricted gene flow with other tissues (Nickle et al. 2003), and accelerated by the high in vivo

mutation rate (Saag et al. 1988) and differential selective pressures (Haase et al. 1996; Pilcher et

al. 2001). This is an important question because viruses in different compartments may exhibit

differential pathogenesis (Donaldson et al. 1994), response to drug therapy (Si-Mohamed et al.

2000; Venturi et al. 2000; Smit et al. 2004) and potentially different transmission rates.

Compartmentalization has already been established for the genital tract in men and women (Zhu

et al. 1996; Poss et al. 1998; Ping et al. 2000; De Pasquale et al. 2003; Kemal et al. 2003;

Philpott et al. 2005; Pillai et al. 2005; Sullivan et al. 2005), which as the most common route of

HIV transmission worldwide (Royce et al. 1997) has tremendous implications for our

understanding of the transmission and initial seeding of infection. If particular characteristics of

the virus infecting the genital tract can be identified, such as tropism, co-receptor usage, viral

epitopes, or structural characteristics, then vaccines and drug intervention could potentially be

developed to target these attributes. Because the virus both within and between individuals is so

variable, interventions will be more efficient if particular subsets of viruses can be targeted and

neutralized, and eradicating the early virus before it can infect other tissues is critical.

Distinct viral populations have also been identified in the central nervous system due to the

blood-brain barrier, which is important because many anti-retroviral drugs are unable to

penetrate the brain, and the proliferation of the virus there can lead to AIDS-associated dementia

(Korber et al. 1994; Hughes, Bell, and Simmonds 1997b; Gatanaga et al. 1999; Morris et al.

1999; Shapshak et al. 1999; Staprans et al. 1999; Venturi et al. 2000; Smit et al. 2001; Wang et

al. 2001; Ohagen et al. 2003; Langford et al. 2004; Petito 2004; Smit et al. 2004; Thompson et al.

2004; Abbate et al. 2005; Burkala et al. 2005; Ritola et al. 2005; Salemi et al. 2005; Strain et al.

2005; Pillai et al. 2006). Lymphoid, spleen, and lung tissues are also subject to

comparmentalization (Wong et al. 1997; Salemi et al. 2007). The few studies that have

investigated compartmentalization in the breastmilk have found conflicting results. One study

found compartmentalization between breastmilk virus DNA and RNA with respect to plasma and

PBMC virus (Becquart et al. 2002). A second study found that viral variants were similar

between plasma and milk (Henderson et al. 2004). However, both studies only considered the

virus from a single timepoint in each patient. In addition, the median time since delivery in the

first study was 3 months versus 12 months in the second study, which may explain the

apparently contradictory results if compartmentalization is not static, but changes over time.

Risk of Transmission via Breast-feeding

Increased transmission has been associated with the mother’s health status and the method

of feeding. In particular, an increased risk of transmission is correlated with a high viral load in

the mothers plasma (John et al. 2001; Fawzi et al. 2002a; Rousseau et al. 2003) and breastmilk

(Van de Perre et al. 1993; Lewis et al. 1998; Semba et al. 1999a; Pillay et al. 2000; Rousseau et

al. 2003; Rousseau et al. 2004). Breastmilk RNA concentrations are typically 2-3 log lower than

the plasma viral load, though the two are highly correlated (Rousseau et al. 2004). RNA viral

loads were found to be significantly higher in the colostrum than in the mature milk produced 14

days after birth (2.59 log10 copies/ml vs. 2.19 log10 copies/ml) but did not significantly change

from 14 days to 15 months (Rousseau et al. 2004). Although this may explain earlier studies

suggesting that most transmission events occur during the first 6 weeks after delivery (Miotti et

al 1999), it can be difficult to determine if an infant was infected via breastfeeding or intra-

partum early in life.

Both cell-free (RNA) and cell-associated (DNA) viral load independently increases the

risk, although the two measures are highly correlated (Richardson et al. 2003; Koulinska et al.

2006). A ten-fold increase in breastmilk RNA is associated with a two-fold increase in risk of

transmission, while a ten-fold increase in infected cells (DNA) is associated with a three-fold

increase in transmission (Rousseau et al. 2003; Rousseau et al. 2004). RNA was detected in one

study in 57% of all breastmilk samples from HIV-1+ women, and in 74% of all samples from

women who transmitted the virus (Koulinska et al. 2006). Cell associated (DNA) virus was

detected in 74% of all breast milk samples from HIV+ women and in 87% of the samples from

women who transmitted via breastfeeding (Koulinska et al. 2006). Interestingly, RNA virus was

significantly associated with late transmission of HIV-1 (>9 months post-partum) while DNA

virus had no time-dependent association.

Other factors increasing the risk of transmission via breastmilk include a low CD4 count

(Leroy et al. 2003; Iliff et al. 2005), mastitis (Van de Peere et al. 1992), and potentially sub-

clinical mastitis (the symptoms are not evident), which are all associated with high viral loads

(Semba et al. 1999b; Willumsen et al. 2000). If the mother was infected after birth, the risk of

MTCT may be increased 6-fold due to the high viral loads associated with primary infection

(Embree et al. 2000). Maternal nutrition status may also be associated with transmission; vitamin

A supplementation was associated with an increased risk, while multivitamin use was associated

with a lower risk (Fawzi et al. 2002b). A possible explanation for these results is the recent

observation that env gp120 binds to an activated form of the integrin alpha4-beta7 on CD4 T

cells which facilitates infection of neighboring cells. Retinoic acid, which is derived from

vitamin A, activates alpha4-beta7 and promotes binding to gp120. Interestingly, the function of

alpha4-beta7 is to act as a homing receptor for T-cells migrating to the GALT (Arthos et al.

2008). Therefore, a possible hypothesis is that increased vitamin A supplementation by the

mother may be passed to the infant through the breast milk, which activates alpha4-beta7 in the

gut of the infant, which in turn promotes binding of gp120 to CD4+ cells in the gut, leading to

increased infection.

The mode of breastfeeding is also associated with differential risk and is the basis for

modification of the WHO guidelines to HIV+ mothers concerning breastfeeding. Recent

observational studies have found that exclusive breastfeeding appears to lower the risk of

transmission over mixed feeding, which is defined as providing the infant with any food other

than water. A study in Durban, South Africa demonstrated an equal rate of infant HIV-1

infection by three months among mothers who had never breastfed and whose infants were

negative at birth (13.2%) and mothers who had exclusively breastfed up to 3 months (8.3%),

compared to a significantly higher infection probability for infants who were mixed fed (19.9%)

(Coutsoudis et al. 1999; Coutsoudis et al. 2001). In a follow-up study of the same cohort, the

cumulative number of HIV-infections (including inter-uterine and intra-partum) was 19.4% for

both the non-breastfeeding and the exclusively breastfeeding mothers, compared to 26.1% for

the mixed-feeding mothers by six months (Coutsoudis et al. 2001). In another study in South

Africa, the cumulative probability of infection at six months for exclusively breastfed infants was

15% compared with 7% for non-breast fed infants and 27% for mixed-fed infants (Coovadia et

al. 2007). In a study of >2000 mothers in Zimbabwe, infants who were HIV negative at six

weeks and who were exclusively breast fed were significantly less likely to be infected at six

months (1.31%) than mixed fed infants (4.4%) (Iliff et al. 2005), while in a study in Zambia, 4%

of exclusively breastfed infants who were negative at six weeks were infected by 4 months,

compared to 10% of mixed fed infants (Kuhn et al. 2007). The mechanism for the reduced risk

via exclusive breastfeeding has not been demonstrated. One possibility is that damage to the

infant’s gut mucosa from early introduction of food compromises the intestinal mucosal barrier

or intestinal immune activation from early introduction of foreign antigens or pathogens

facilitates transmission (Smith and Kuhn 2000). Mixed feeding could be associated with

suboptimal breastfeeding, subclinical mastistis (in which no swelling of the mammary gland is

visible during inflammation), or poor health in general which leads to higher viral loads

(Willumsen et al. 2000), so the association between mixed feeding and transmission is complex

(Chisenga et al. 2005). However, the study in Zambia controlled for viral load and maternal

health (as measured by CD4 cells) and still found a significant association (Kuhn et al. 2007).

The increased risk in transmission upon cessation of exclusive breastfeeding may also be related

to the increase in mammary epithelial permeability upon weaning (Kourtis et al. 2003). This

hypothesis is supported by both the finding that the mean concentration of plasma albumin was

higher in the milk of transmitting mothers (Becquart et al. 1999) as well as the increase in

breastmilk viral load following weaning (Thea et al. 2006).

In addition, the duration of the breastfeeding has been suggested influence the risk of

transmission, although the association is not clear. Over a two year period, the risk of

transmitting the virus through breastfeeding was estimated at 16% (37% for mothers who did

breastfeed, vs. 21% for mothers who did not) (Nduati et al. 2000). A study in Zimbabwe found

that more than 2/3 of all post-natal MTCT occurred after six months (Iliff et al. 2005) while a

study in Kenya found that the majority of transmission events occurred before six months

(Nduati et al. 2000). Other studies have found an approximately constant risk over time

(Coutsoudis et al. 2004) or a reduced risk over time (Kuhn et al. 2007). The contradictory results

are difficult to interpret, although it is difficult to compare across studies because of the different

interpretation of intra-partum vs. early breastfeeding transmission and the non-distinction of

exclusive vs. non-exclusive breastfeeding. In a study in Zambia, the risk of HIV-1 transmission

during non-exclusive breastfeeding was shown to be greater during the first four months (~2.4%

per month) compared with 1% from 5-12 months, and 0.5% from 12-24 months, suggesting that

the greatest benefit for exclusive breastfeeding is only until month 4. The risk of continuing to

breastfeed non-exclusively was compensated by the mortality associated with not breastfeeding,

and no difference in HIV-1 free survival was detected between mothers who stopped breast-

feeding at four months versus mothers who continued up to 16 months (Kuhn et al. 2007;

Sinkala et al. 2007). Therefore, while the cumulative probability of HIV-1 transmission may

increase over time, the more important measure of HIV-1 free survival may be increased with

continued breastfeeding.

Our Study

The current WHO recommendations concerning breastfeeding by HIV+ mothers are based

on several observational studies that provide strong evidence for the attenuation of risk of MTCT

by exclusive breastfeeding. However, although the WHO advises women to abruptly wean at 6

months, recent data suggest a decreased risk of MTCT over time that is outweighed by the

benefits of continued breastfeeding (Kuhn et al 2007). Furthermore, while exclusive

breastfeeding is critical during the first several months post-partum, after 4 months the benefits

are less evident (Kuhn et al 2007). Clearly, a better understanding of the molecular mechanisms

underlying the observations is needed in order to provide HIV-1 positive women with the

optimal recommendations.

HIV-1 rapidly evolves within a patient in response to constantly changing host pressures.

Within several months of infection a new viral population may emerge, often with functionally

significant differences. New viral populations may have an altered entry, replication, or immune

evasion phenotype, which could impact the overall pathogenicity or transmissibility of the virus.

Therefore, investigating the viral evolution over several years can provide important information

about both the characteristics of the virus as well as the dynamic host environment. It is likely

that features of the virus at particular junctures during the course of breastfeeding may impact its

ability to initiate a new infection in the infant. Understanding the evolutionary dynamics of the

milk virus may elucidate the underlying cause for the attenuated risk of MTCT associated with

exclusive breastfeeding as well as the reduced risk of transmission over time.

This study is the first to investigate the longitudinal evolution of HIV-1 in breastmilk with

a focus on elucidating important factors underlying transmission to the infant. I have posed

several questions designed to address aspect of the viral evolution that may be important in

transmission. First, I wanted to determine the means by which virus enters the breastmilk.

Specifically, I investigated if breastmilk is infected by virus from the plasma during the initial

stages of lactogenesis, or if the early breastmilk virus is distinct from the virodeme (HIV-1

subpopulations) in the peripheral blood system suggesting an alternative infection. Second, I

wanted to learn more about the characteristics of the virus when it is in breastmilk. Specifically, I

investigated if virus in the breastmilk evolves into a separate population distinct from the

peripheral virus population, indicating compartmentalization. Related to that question, I also

investigated to what degree the virus migrates between tissues. Third, I wanted to learn more

about what role natural selection might play in the evolution of the virus. Specifically, I explored

what selective pressures are present during the evolution of the breastmilk virus, and whether

they differed between the plasma and the milk. Finally, I wanted to know if the characteristics of

the virus in breastmilk change over time. Specifically, I considered whether particular

evolutionary dynamics between four and six months could be associated with either the

documented cessation of exclusive breastfeeding or the transmission of the virus to the infant. To

address these questions, I analyzed the population of viruses, or virodeme, present in the breast

milk (left and right) and plasma RNA over a two-year period (timepoints at birth, 1, 4, 12, 18,

and 24 months) from an HIV-1 positive woman who was documented to have transmitted the

virus to her infant via breastfeeding at four months post-partum.

Subject

The HIV-1 positive female subject in this study was enrolled in the Zambia Exclusive

Breast Feeding Study (ZEBS). The goal of the ZEBS was to determine whether exclusive

breastfeeding was associated with a lower risk of transmission (Thea et al. 2004; Kuhn et al.

2007). All women were encouraged to breastfeed exclusively until month 4, at which time

women were randomized into two groups: the control group, in which women were encouraged

to exclusively breastfeed until six months and then gradually introduce complementary foods,

and the intervention group, in which women were encouraged to rapidly wean their infants after

four months (Thea et al. 2004; Kuhn et al. 2007). This patient was part of the control group and

breastfed until 18 months post-partum. She reported exclusively breastfeeding until four months,

and reported mixed feeding at six months, which was defined as providing the infant with any

non-maternal milk substance including water and cow’s milk. Her infant had the first positive

PCR for HIV at 4 months, and negative PCRs at 1 week, and 1, 2, and 3 months, strongly

suggesting that breastmilk was the vehicle of transmission. Breast milk and blood were serially

sampled from the mother at 1 week, 1 month, 4 months, 12 months, 18 months, and 24 months

(see Figure 5-1).

Viral Isolation, Amplification, and Sequencing

RNA was isolated from breastmilk and plasma samples using a Qaigen RNA Easy kit

(Qiagen, Valencia, CA, USA) and 10 ul of RNA extract was used as template in a One-Step RT-

PCR kit (Invitrogen, Carlsbad, CA, USA) according to manufacturer’s instructions. To amplify

the V1V5 region of the env gp120 gene, I used first round primers pol-env (5’-

GAGCAGAAGACAGTGGCAATGA-3’) and 192H (5’-CCATAGTGCTTCCTGCTGCT-3’).

Cycling conditions were as follows: 50ºC for 30’, 94ºC for 2’, followed by 35 cycles of 94ºC for

15s, 58ºC for 30s, 72ºC for 2’, and a final extension step of 72ºC for 10’. A second nested PCR

was performed using the GoTaq PCR Supermix (Promega, Madison, WI, USA) according to

manufacturer’s instructions using 5ul of the first round amplification as template and primers D1

(5’-CACAGTCTATTATGGGGTACCTGTGTGGAA-3’) and 194G (5’-

CTTCTCCAATTGTCCCTCATA-3’) with cycling conditions as follows: an initial denaturing

step of 95ºC for 5’, 35 cycles of 94ºC for 1’, 58ºC for 1’, and 72º for 2’, followed by a final

extension step of 72ºC for 10’. In some cases, a third round PCR was necessary using 5ul of the

second round as template, primers Env5 (5’-

GGGGATCCGGTAGAACAGATGCATGAGGAT-3’) and 194G, and the same cycling

conditions as described for the second round PCR. Additional sequences were generated for the

plasma 1 week, 12 month, and 24 month samples using 1ul of RNA template to determine

whether larger input volumes of RNA were contributing to PCR-generated recombination. PCR

products were ligated into TopoTA vector (Invitrogen, Carlsbad, CA, USA) according to the

manufacturer’s instructions, transformed into top10F’ cells for 30’ on ice, followed by a 30s

heatshock at 42 ºC, and incubation overnight at 30 ºC. In sum, approximately 30 clones from

each sample were sequenced in both directions by the Genome Sequence Service Laboratory at

the University of Florida yielding 284 independent sequences.

Sequence Analysis and Recombination

The V1V5 sequences were manually aligned and checked for accuracy using BioEdit v7.0

(Hall, 1999) and Mega v4.0 (Tamura et al. 2007). V1 and V2 haplotypes were assigned based on

sequence similarity. Length and number of glycosylation sites were calculated manually.

The entire V1V5 region as well as shorter regions of the gp120 gene were evaluated for

recombinant sequences based on an algorithm that I helped developed that uses the pairwise

homoplasy index (PHI) test in conjunction with phylogenetic networks (Salemi MM, Gray RR,

Goodenow MM, unpublished). The PHI test statistic is a modified sum of incompatibility scores

calculated for pairs of informative sites in an alignment. The PHI statistic is then assessed with a

normal approximation of a permutation test, and has been shown to be a powerful method with

simulation studies (Bruen, Philippe, and Bryant 2006). In our method, viral populations were

tested for significant population structure using the K*s test (Hudson, Slatkin, and Maddison

1992; Achaz et al. 2004) and divided into smaller datasets by time point if significant population

structure was detected. A neighbor-net phylogenetic network was inferred for each dataset,

which allows the presence of phylogenetic uncertainty, and sequences were progressively

removed until the PHI test statistic was no longer significant (p>0.05). The smaller datasets were

again combined, and the procedure repeated to detect inter-time point recombinants. All putative

recombinants were removed from the dataset for further analysis. Pairwise calculations were

performed for the non-recombinant alignment using Mega v4.0 with the Kimura 2-Parameter

model with pairwise deletions and uniform rates.

Phylogenies were estimated for non-recombinant datasets using a Bayesian analysis with

the BEAST software package 1.4 (Drummond et al. 2005; Drummond and Rambaut 2007). The

evolutionary rate was initially estimated using a model assuming a strict clock, the SRD6 model

of nucleotide substitution (HKY + the 1st and 2nd codon positions were considered separately

from the 3rd , hereafter referred to as two partitions (Shapiro, Rambaut, and Drummond 2006)),

and a constant population size with sampling time information included in the model

(Drummond et al. 2006). The rate obtained from this model was then used as a prior in further

analyses that assumed a relaxed clock with log-normally distributed rates with three models of

nucleotide substitution: the SRD6 model, the gamma time-reversible (GTR) model with two

partitions, and the GTR model with three partitions (all three codon positions considered

separately). The Markov Chain Monte Carlo analysis was run for 100,000,000 generations with

sampling every 10,000th generation. The results were visualized in Tracer v.1.3, and convergence

of the Markov chain was assessed by calculating the effective sampling size (ESS) for each

parameter (Drummond et al. 2006). All ESS values were >500 indicating sufficient sampling.

The time to the most recent common ancestor (TMRCA) was estimated for each analysis. The

marginal likelihoods of each model were compared using the Bayes Factor method, in which a

difference of greater than 20 log units between the marginal likelihoods of any two models is

considered significant evidence for the alternative model (the more complex model). The correct

root of the phylogeny was analyzed by constraining all sequences except particular groups from

the first timepoint as the ingroup using the SRD6 model with the relaxed clock and log-normal

distribution of rates. This was performed for five sets of outgroup sequences, and the marginal

likelihoods were compared using the Bayes Factor test. A 50% consensus tree based on the

posterior distribution of trees with a 50% burnin was calculated using Mr. Bayes and

manipulated in FigTree v.1.0.

A maximum likelihood (ML) phylogeny was inferred using PAUP v. 4.0 (Swofford 2002).

The best-fitting nucleotide substitution model was tested with a hierarchical likelihood ratio test

using a neighbor-joining tree with Jukes and Cantor corrected distances. Statistical support for

internal branches in the tree was obtained by bootstrapping (1,000 replicates). All rootings of the

best ML phylogeny were generated in MacClade (Maddison and Maddison 1989). The most

likely rooting was determined using baseml in the PAML package (Yang 1997) with the clock

hypothesis and sampling information included with the sequences. The best rooted tree was then

re-estimated without the clock hypothesis in PAML.

To determine the subtype of this patient, one sequence from each tissue (plasma, left and

right breast milk) from week 1 was aligned to representative sequences from each major subtype

in macrogroup M obtained from the Los Alamos database

(http://www.hiv.lanl.gov/content/sequence/HIV/SUBTYPE_REF/align.html). A neighbor-

joining tree was inferred using PAUP v. 4.0 with the HKY model of substitution, estimated

transition/transversion ratio, an estimated gamma distribution of rates and an estimated

proportion of invariable sites. Statistical support for internal branches in the tree was obtained by

bootstrapping (1,000 replicates).

Branch Selection Analysis

A branch selection analysis was performed using HYPHY (Pond, Frost, and Muse 2005).

The synonymous/non-synonymous rate ratio (dN/dS) was estimated for each branch leading to a

major clade, as well as the 95% confidence interval (CI). All branches for which the CI did not

include 1.0 were determined to significantly deviate from neutral evolution.

Compartmentalization

Compartmentalization between tissues was investigated using a modified version of the

Slatkin-Maddison test (Slatkin and Maddison 1989) implemented in MacClade (Maddison and

Maddison 1989) using the State Changes and Stasis option. The number of unambiguous

instances in which an ancestral state changed from one tissue to the other, as well as all instances

in which no change occurred, were calculated for each phylogeny sampled from the posterior

distribution in BEAST with a 25% burnin and 10,000 random splitting and joining trees. The

Kolmogorov-Smirnov nonparametric test was used to compare the distribution of each category

(breastmilk to breastmilk, breastmilk to plasma, plasma to plasma, and plasma to breastmilk) for

the estimated and random trees. The nonparametric Wilcoxon Rank Sum test was used to

compare the median value for each category between the estimated and random phylogenies.

Results

Subtype Analysis

The neighbor-joining tree using representative sequences from each major subgroup in

macro-group M is shown in Figure 5-2. A high bootstrap value (100) supports the placement of

this patient into subtype C.

Sequence Analysis

An initial dataset of 184 sequences was obtained and used in the majority of analyses

(Table 5-1). Additional sequences were generated for three plasma samples (1W, 12M, 24M) to

rule out the possibility of PCR-generated recombination. These sequences were not included in

any of the phylogenetic analyses. Additional sequences were also generated for milk samples

from month 1. These sequences were included in a subset of phylogenetic analyses as noted

below.

Variable regions 1 and 2 sequence analysis

All generated sequences (n=282) were included in the sequence analysis. Because the

length variation in the V1V2 region may confound phylogenetic analyses, this variation was

analyzed separately. All generated sequences (n=282) were included. For V1, six different

haplotypes were assigned (A-F) based on amino acid (aa) motif, length, and number of

glycosylation sites. Representative sequences for each haplotype are shown in Figure 5-3. The

length of V1 ranged from 29aa-51aa, and the number of gylcosylation sites ranged from 1-5.

Haplotypes D, B, and C were most prevalent (in that order), and share similar aa motifs but differ

in length. For V2, five haplotypes (A-E) were assigned. Representative sequences for each

haplotype are shown in Figure 5-4. The length of V2 ranged from 40aa-51aa, and the number of

gylcosylation sites ranged from 1-3. Haplotype B was most prevalent, with the remaining

sequences somewhat evenly divided between the other haplotypes. The amino terminus was

fairly well conserved among sequences, as was the leu-asp-iso motif that mediates binding of

activated integrin a4b7, which mediates migration of CD4 T-cells to the gut (Arthos et al. 2008).

In V1, the plasma shows a trend of increasing diversity over time (number of haplotypes),

with the haplotypes present at the earliest timepoints being retained (Table 5-2). The longer

haplotypes C and D are present at the first timepoint, while the shorter haplotype B emerges at

month 12 at a low frequency. Haplotype B increases to almost 50% by month 24, which suggests

that a trend of shortening V1 length and loss of glycosylation sites over time. However, because

haplotype B is present in the breastmilk at the first time point (BMR1W), it is unclear whether

the presence of haplotype B in the plasma by 12 months is this is the result of a migration from

the breastmilk to the plasma, or whether the viral population in the plasma evolved into

haplotype B independently (Table 5-2). Haplotype A, which is the most divergent from

haplotypes B, C, and D, remains at a low frequency at all timepoints in the plasma. Haplotype E,

which has a unique aa motif, emerges only at month 24. There is less diversity in the breastmilk

sequences at most timepoints relative to the plasma. At week 1, the left breast has only a subset

of haplotypes found in the right breast. Haplotype B is unique to the breastmilk, and while two of

the haplotypes (A and C) are present in both the plasma and the breastmilk, haplotype A is found

at very low frequencies in the plasma. By month 1, a new haplotype emerges in the breastmilk

(F), which is not found at any other timepoint or tissue, but is found in both breasts. In fact, the

viral population in the left breast seems to have experienced a complete population replacement

from haplotype A in week 1 to haplotypes B and F at month 1. In contrast, the viral population in

the right breast displays more continuity from week 1 to month 1. This pattern suggests either

limited gene flow from the right to the left breast, or else the left breast is infected with only a

subset of the viral population infecting the right breast. Haplotype D is present in the right breast

(but not the left) at month 1, which at week 1 was in the plasma but not the breastmilk. At month

4, haplotype D is the most frequent in the both the breastmilk and the plasma, and by the last two

timepoints only haplotype D is found in the breastmilk. Interestingly, this is the opposite trend of

the plasma, in which diversity increased over time. This could suggest that at month 1, the right

breast began to experience some gene flow from the plasma, though the left breast remained

compartmentalized. By month 4, there was complete geneflow between both breasts and the

plasma. By the last timepoints, the gene flow from the plasma subsided (Table 5-2).

In V2, a similar trend of increasing diversity over time in the plasma is apparent (Table 5-

2). Again, the haplotypes present at month 24 includes all haplotypes seen at previous timepoints

with the exception of haplotype E that was present at month 4. Neither the length nor the number

of glycosylation sites appear increase or decrease over time, although the diversity again

increases over time. In the breastmilk, again the haplotypes in the left breast are a subset of those

found in the right at both week 1 and month 1, and again a population turnover is apparent in the

left breast from week 1 to month 1. At month 4, the same haplotypes are found in the breastmilk

and the plasma, and again by 12 and 18 months only one haplotype is found in the breastmilk.

There is no clear association between particular haplotypes in V1 and V2. This may be due

to recombination between the two regions or to convergent evolution, as the difference between

some of the haplotypes is only a few amino acids. Overall, there is a clear trend in the plasma of

increasing combinations of haplotypes, while the opposite trend in true in the milk.

Variable region 3 loop analysis

All sequences had V3 loop charges of <5, which in subtype B is predictive of CCR5 co-

receptor usage (data not shown). However, the association between charge and co-receptor usage

is not as defined in subtype C. Further functional analyses are being performed to determine

actual tropism and co-receptor usage.

Recombination Analysis

The presence of population structure was tested for each group of sequences from

different tissues and timepoints. All inter-tissue and inter-timepoint comparisons demonstrated

significant population structure (p>0.001) except for the three tissues sampled at month four,

which showed no population structure (Table 5-4). These sequences were therefore considered as

one group for the recombination test.

The V1-V5 alignment was initially tested for the presence of putative recombinants using

the PHI/phylogenetic method described above with a dataset of 184 sequences. However, a non-

significant inter-timepoint dataset could not be obtained using more than half of the sequences.

Therefore, the alignment was shortened into four new alignments: V1-V3, V1-V2, C2-V3, and

C2-V3 (see Figure 5-5). For the V1-V3 dataset, again a non-recombinant dataset could not be

obtained. The full V1-V2 dataset initially suggested the presence of recombinants (p=8.43 x 10-

7), and 27 sequences could be removed so that no recombination was detected (Table 5-5). For

the C2-V3 dataset, no recombinants were detected. For the C2V5 dataset, initially the dataset

was again unable to be resolved. However, the removal of ~10 amino acids at the carboxyl end

of V3 as well as all of V4 resulted in a non-recombinant dataset. These results suggested that a

hotspot for recombination is located at the amino terminus the V2 region or the carboxyl

terminus of the C2 region of the gp120 gene.

Two alternate datasets could be produced for the C2V5 alignment, depending on which week

1 breastmilk sequences were removed. The network for these sequences is shown in Figure 5-6

(when all sequences were included, p=4.06 x 10-4, indicating a significantly recombinant

dataset). Four distinct groups are evident: Group 1 contains all of the sequences from the left

breast, while the sequences from the right breast form groups 2-4. Two of these groups could be

removed separately to result in a non-recombinant dataset: Group 3 plus two additional non-

Group 3 sequences (p= 0.076) or Group 1 (p=0.376). Because choosing the first alternative

resulted in less putative recombinants overall (31 vs. 34), and to avoid removing all of the

representatives from the left breast, the first alternative (C2V5 [A]) was considered the better of

the two options. However, to confirm some of the phylogenetic results, the second alternative

(C2V5 [B]) was also used. Other than the breastmilk week 1 sequences and the three additional

recombinants, the other sixteen recombinants were identical between C2V5 (A) and (B).

In order to confirm that the multitude of recombinant sequences was not due to PCR-

generated recombination, additional plasma samples from week 1, month 12, and month 24 were

re-amplified using 1/10 the input RNA, as increased template has been suggested to cause in

vitro recombination. The new sequences were added to the C2V5 alignment, and the dataset was

re-tested for recombinants. Similar variants were recovered for all three timepoints, and again the

majority of the plasma recombinants were found in the month 24 sample, suggesting that the

recombinant sequences are actually present in vivo.

The breastmilk sequences sampled from month 1 were added to the C2V5 (A) alignment. No

additional recombinants were detected among these sequences.

Bayesian tip-date phylogeny

The C2V5 (A) and C2V5 (B) alignments were used to infer a posterior distribution of trees

using the program BEAST under three models of nucleotide substitution (SRD6, GTR + 2

partitions, and GTR + 3 partitions) with a relaxed molecular clock, and the SDR6 model

assuming a a strict molecular clock. For the C2V5 (A) dataset, the Bayes Factor of the GTR + 2

partition model was >20 log higher than either the SRD6 model or the strict clock model,

suggesting a significantly better fit to the data, while the GTR + 3 model was <20 log higher than

the GTR +2 model, suggesting unnecessary parameterization (Table 6). For the C2V5 (B)

dataset, both GTR models resulted in very low ESS values after 100,000,000 generations of the

MCMC chain, and thus these models were not considered. The Bayes Factor for the SRD6

model with the relaxed clock was >20 log higher than the strict clock (data not shown), again

suggesting that the strict clock could be rejected for both datasets.

The posterior distribution of trees generated under the best model (as identified above) was

used to generate a 50% consensus phylogeny for C2V5 (A) (Figure 5-7) and C2V5 (B) (Figure

5-8). Both phylogenies demonstrated similar topologies with strong temporal evolution. In

phylogeny (A), breastmilk week 1 samples formed three groups (1, 2, 3) that clustered in

separate clades at the base of the tree. Groups 1 and 2 were well supported (>95%), while group

3 (from the right breast) was moderately supported (72%). Groups 2 and 3 showed moderate

support for clustering together (79%). In phylogeny (B), breastmilk week 1 Groups 2 and 4 also

form well supported clades (95% and 100%, respectively). Group 3 is again moderately

supported (73%), but in this case does not cluster with Group 2 and is part of the larger clade of

the remaining sequences. In both phylogenies, the week 1 plasma sequences do not cluster with

the breastmilk sequences and with one exception are part of the larger clade of remaining

sequences. These results indicate that the breastmilk and plasma viruses, as well as the virus in

both breasts, are distinct one week after birth. Furthermore, the virus in the right breast is also

more diverse than the left, as shown in the V1V2 analysis.

The majority of the month 4 sequences form a moderately well-supported paraphyletic

clade (90% in [A] and 84% in [B]), and a well-supported clade containing a subset of month 4

plasma, right and left breastmilk sequences is present within the larger clade (99% [A] and 94%

[B]), consistent with a lack of population structure among these sequences. Thus, at month 4, it

appears that no biological barrier is restricting gene flow between the tissues. In general, this

large population of month four viruses does not give rise to later viral populations, with the

exception of a few month 12 plasma sequences. Furthermore, this group of month 4 sequences

appears to evolve from the earlier plasma virus rather than from the breastmilk virus.

The rest of the later timepoint sequences form two major clades that cluster together with

moderate support (89% [A] and 83% [B]). In the first major clade, the majority of the plasma

sequences from 12 months form a well-supported clade (92% [A] and 82% [B]) which gives rise

to the clade of 24 month sequences (93% [A] and 80% [B]). In the second major clade the few

remaining month four sequences from all three tissues cluster with all of the month 12 breastmilk

sequences as well as three plasma month 12 sequences with very high support (98% [A] and

81% [B]). In (A), the month 12 breastmilk sequences cluster with the month 18 breastmilk

sequences as well as a few plasma month 24 sequences with moderate support (81%).

Alternatively, in (B), the month 18 breast milk sequence cluster with the majority of the plasma

sequences from months 12 and 24. This suggests that either the virus in the breastmilk re-

compartmentalizes after month 12 and evolves independently from the plasma virus (A), or that

the breast milk was re-infected by the virus in the plasma at month 18 (B). For the breastmilk

month 18 sequences, only a truncated region amplified, probably due to low viral load. This may

explain the phylogenetic uncertainty as to their true location. Irrespective of the true position of

the month 18 sequences, at month 12, there seems to be limited migration of the virus between

the breastmilk and the plasma. Possibilities for the apparent lack of gene flow include a

biological barrier such as a renewed tight junction between epithelial cells or limited production

of breastmilk.

Rooting the phylogeny

Because the origin of the virus in both tissues is of great interest, obtaining the correct root

for the phylogeny is essential before drawing conclusions. Because the model chosen for the

Bayesian analyses assumed a relaxed molecular clock, the most likely position of the root is

estimated along with possible topologies. To formally test the position of the root, five

alternative consensus Bayesian phylogenies were estimated for the C2V5 (A) dataset (Table 5-

6). Both the original analysis with no constrained outgroup and the analysis with the breastmilk

group 1 sequences as the outgroup had the highest marginal likelihoods, though the marginal

likelihoods for all analyses were very similar to each other and no significant difference could be

ascertained. In order to further investigate the correct rooting, a maximum likelihood (ML)

phylogeny was estimated with an assumption of no molecular clock, and the most likely root was

estimated from the entire set of all possible roots (Figure 5-9). The topology of the correctly

rooted ML tree was similar to the Bayesian consensus phylogeny, although most branches were

not well-supported and the phylogeny is less resolved; specifically the temporal structure of the

virus evolution is much less clear. The root was placed at the breastmilk week 1 sequences, with

a bootstrap value of 100. The group 2 breastmilk sequences are still well supported (87%). The

group 3 breastmilk sequences were not supported as a clade in the ML rooted phylogeny,

however, and appear to give rise to the plasma sequences. The breastmilk month 12 sequences

clade together and appear to give rise to the month 18 breastmilk sequences, with only a few

plasma sequences are present in this clade. The plasma month 24 sequences also clade together

(along with one week 1 plasma sequence), although not with the month 12 plasma sequences as

in the original Bayesian phylogeny. However, none of these clades are well supported (<50

bootstrap value).

In general, the ML rooted phylogeny supports the conclusions drawn from the Bayesian

phylogeny, in particular the distinct groups in the first timepoint breastmilk sequences. The

Bayesian outgroup analysis as well as the ML analysis suggests that the breastmilk sequences did

not arise from the plasma and that they share a common ancestor well before birth. The TMRCA

was estimated for each of Bayesian outgroup analyses, and the range was between 1053-1171

days before present (=24 months). This corresponds to about one year before birth, before the

pregnancy and well before the initiation of breastmilk. This supports a model in which the

breastmilk infection is initiated locally, or else is seeded by infected cells which traffic from

other tissues in the body than the plasma. By month 4, the breastmilk virus appears to be

replaced by a virus that shares an origin with the plasma virus, while by month 12, the breastmilk

virus is again compartmentalized. In general, both the Bayesian and maximum likelihood

methods and both alternative alignments result in similar topologies, indicating high

phylogenetic signal and robust results.

Branch selection analysis

The dN/dS ratio was tested for each of the branches leading to major clades on the best-

model Bayesian consensus tree. For several of the branches, the ratio was significantly different

from 1, indicating a deviation from neutrality (data not shown). The branch leading to the right

breastmilk sequences at the first timepoint was under negative selection, as was the branch

leading to the majority of the month four plasma and breastmilk sequences (Figure 5-10). Three

branches towards the later part of the phylogeny were under significant positive selection: the

branch leading to all of the late breastmilk sequences, the branch leading to the month 18 milk

sequences plus some month 24 plasma sequences, and the branch leading to the majority of the

plasma month 24 sequences. These results suggest changing selective pressures in both the

plasma and breastmilk later in infection.

Inclusion of breastmilk month 1 sequences

Because the characteristics of the week 1 and the month 4 breastmilk viruses appear very

different, i.e. a shift from compartmentalization to mixing with the plasma, additional sequences

from month 1 were included in the C2V5 (A) alignment. The Bayesian analysis was re-estimated

using the SRD6 model of nucleotide substitution, a relaxed clock with a log-normal distribution

of rates and a constant population size, and the 50% consensus tree was calculated (Figure 5-11)

The left breastmilk sequences from month 1 cluster together with a moderate probability (82%),

suggesting a common origin, while they cluster with the left breastmilk week 1 group 1

sequences with low probability (57%). The initial right breastmilk week 1 virus does not appear

to evolve into the month 1 virus as these sequences do not cluster together. Rather, the right

breastmilk virus appears to have been replaced by a different, but similar, viral population. This

supports a model in which infected cells traffic to the breastmilk from other tissues, and

continuously seeds a new infection. Only a subset of infected cells may migrate from the tissue

to the milk, which would explain the observed pattern of variation. If the locally infected

mammary cells were infected, a stronger evolutionary relationship would be expected between

the week 1 and month 1 virus. There is still no support for the plasma having seeded the month 1

virus population.

Another piece of information to be learned from the phylogeny is the relative branch

lengths. Typically, longer internal branch lengths relative to the terminal branches is indicative

of a constant population size. Conversely, longer terminal branches relative to the internal branch

lengths can be indicative of exponential growth (Grenfell et al. 2004). In general, the internal

branch lengths leading to the month 1 clades are much longer than those in the plasma clades,

which is indicative of a constant population size. This supports a model in which the infection in

the breastmilk is a slowly evolving infection, possibly being reseeded with virus from other

tissues. This could also support a model in which a limited number of cells are available for

infection in the breastmilk relative to the plasma, which restricts the potential growth of the

infection.

Migration Analysis

To test the hypothesis that the virus in the breastmilk is compartmentalized with respect

to the plasma virus, a modified version of the Slatkin-Maddison test was used which compares

both the distribution and the median number of four possible events (migration from breastmilk

to plasma or plasma to breastmilk, and constant state of breastmilk or plasma) for each tree from

the posterior distribution and a set of randomly generated trees. The distributions for all four

possible scenarios were significantly different (p<0.0001) than the random expectation. The

median for each of the four events was also significantly different (p<0.0001) than the random

expectation. The minimum, maximum, and average for each event for the actual and random

trees is shown in Figure 5-12. These results suggest that significant compartmentalization of the

breastmilk does occur over the two-year period. In order to determine if there was migration

between the left and right breasts, the analysis was repeated using nine potential events. Again,

both the distribution and the median for each of the nine events in the observed trees were

significantly greater than expected by chance (p<0.001 for all comparisons). The minimum,

maximum, and averages are shown in Figure 5-13. This suggests that each breast functions as a

separate compartment for the virus over time.

Discussion

This study represents the first longitudinal analysis of the evolution of HIV in breastmilk

and was designed to answer several questions about the evolution of the breastmilk virus over

time. The initial question was whether the virus in the breastmilk originates from the plasma

during lactogenesis, or is seeded by either a local infection of the mammary cells or infected

cells trafficking from another tissue. The haplotype analysis of V1V2 and the phylogenetic

analysis of C2V5 suggest that the milk virus does not derive from the plasma. V1 and V2

haplotypes in the milk are not present in the plasma at the first timepoint, and the phylogenies

clearly show well-supported clades for the milk viruses that do not include plasma virus. The

month 1 milk sequences show the same pattern, although they are not unequivocally derived

from the week 1 virus. This lends more support to the model of trafficking infected cells, which

are infected with related viruses but do not represent the full range of variation in the source

tissue. Furthermore, the majority of the mutations are located on the internal branches of the milk

clades, rather than on the external branches as for the plasma sequences. This suggests a constant

size population of virus in the milk, which could be consistent with a low migration rate of

trafficking cells. This hypothesis is also consistent with earlier observations that the T-cells and

macrophages in the milk are phenotypically different than the cells in the blood (Bertotto et al.

1990b; Ichikawa et al. 2003; Kourtis et al. 2003). Because only two tissues were included in this

study, the tissue of origin cannot be determined. However, one likely source of infection could

be the gut associated lymphoid tissue (GALT) (G. Aldrovandi, personal communication). The

GALT contains the majority of T-cells in the body, which are the primary target cells of HIV-1.

During acute infection in the initial stage of disease, HIV-1 targets this tissue which results in the

massive depletion of T-cells (Douek 2007b; Douek 2007a). The high frequency of infected cells

in this tissue is consistent with a model in which T-cells trafficking to the breast milk during the

first month post-partum initiate a new infection. Because the GALT is infected early in the

course of disease, this scenario would also explain the observation that the milk virus appears to

emerge from the root of the tree and shares an ancestor with the plasma virus >1 year pre-

partum.

A second question addressed by this study was whether the onset of weaning was

associated with any evolutionary changes. The mother reported exclusive breastfeeding at four

months, although by six months she reported mixed feeding. The milk and plasma virus were

part of the same population at the month 4 sample, which appeared to have evolved from the

earlier plasma virus rather than from either the milk virus or another tissue. Therefore, it seems

that the plasma virus infected the milk before the onset of weaning, which is contrary to the

initial expectation that weaning would precede panmixia. The opening of the paracellular

pathway at the onset of weaning (<2 feedings per day) is associated with increased levels of

plasma derived minerals and nutrients, and would provide an opportunity for the plasma virus to

migrate to the milk. It is interesting that the infant was infected at 4 months as well, even though

the mother was reportedly exclusively breast feeding. Unfortunately, the precise timing of the

events around four months cannot be ascertained, so an exact order of weaning and HIV

infection cannot be determined.

A third question was whether the breastmilk virus compartmentalized over time. The

earliest virus is clearly derived from a different population than the plasma virus, but at month 4

the two tissues are experiencing nearly complete migration. By month 12, the milk virus appears

again compartmentalized with respect to the plasma virus, and appears to derive from the month

4 milk virus. Overall, this does suggest compartmentalization of the virus, and the analysis for

migration indicated much greater compartmentalization than expected by chance. By month 12

and 18, the mother was most likely breastfeeding much less frequently than during the few

months after birth, and therefore milk production would have been lower and fewer opportunities

may have existed for the plasma virus to enter the milk. In addition, the initial source of infection

does not appear to have re-infected the milk, possibly again due to lowered milk production.

These results may also explain the discrepancies reported earlier in the literature concerning

whether the breastmilk virus compartmentalized, since those studies considered very different

timepoint. This study demonstrates the importance in choosing sampling times carefully before

making generalizations about the origin and evolution of breastmilk virus.

Finally, I examined the selective pressures that were acting on the virus over time. My

analysis indicated that an episode of negative selection had occurred during the evolution of the

right breastmilk virus at week 1, as well as the breastmilk and plasma at month 4. During the

later stages of evolution, several independent episodes of positive selection occurred on branches

leading to the month 12 and 24 viruses in both tissues. This could suggest a relaxation of host

pressure allowing the viral population to acquire new beneficial mutations. This is also consistent

with the longer terminal branch lengths in the later samples, which is indicative of exponential

growth. . In another study, I showed that the diversity and effective population size of the virus

increases as host selective pressure relaxes and the immune system begins to fail (as measured

by decline of CD4 cells) (Gray et al, in prep). No clinical information was available for the

patient in the current study so this hypothesis could not be tested. However, this is an avenue of

investigation for future studies.

The conclusions drawn from this study have implications for the management of

breastfeeding by HIV-infected mothers. First, because the milk virus is distinct from the plasma

virus until at least month 1 and because it originated in tissues with potentially different selective

pressures, the initial milk virus may have different pathogenicity and/or transmissibility than the

plasma virus. If so, this could modulate the risk of MTCT transmission during the first few

months and impact the recommendations currently given to women. Second, the population

dynamics of the virus clearly changes by month 4. Although the mother reported exclusive

breastfeeding, she did initiate weaning sometime within the weeks after this sample was

collected. Therefore, although it appears that migration of the virus between tissues occurred

prior to weaning, it is difficult to confidently conclude that the two are unlinked. Third, this

study conclusively demonstrates that the milk virus is compartmentalized throughout much of its

production. The again suggests that the milk virus may evolved different characteristics that

modulate transmission. Further, if the infection in the milk maintains a constant population size

as suggested by the phylogeny, and is protected from the more rapidly growing plasma virus,

fewer infected cells are available to transmit the virus. Determining exactly why the virus is or is

not compartmentalized in the milk may aid our understanding of how to lower the risk of MTCT.

Lastly, this study demonstrates that the evolution of the virus in the milk is a dynamic process. I

am currently studying two additional patients for whom samples are available over a one-year

period to determine whether the same dynamics occur in other patients as well. Understanding

the complexity of the evolutionary process is of great importance in understanding how

exclusive breastfeeding attenuates the risk of transmission on a molecular level, so that more

precise recommendations for avoiding MTCT can be made to HIV-1 infected women worldwide.

This study was conducted within an evolutionary anthropological framework on several

levels. Traditional analytical techniques used for population genetics were implemented here to

investigate the evolution of a human pathogen, although on a much shorter timescale than

traditionally considered. Because the rate of evolution in HIV-1 is many orders of magnitude

faster than in other pathogens, the population dynamics of infection, including response to

selective pressures, changes in population size, and migration between compartments, can all be

measured within one individual, rather than within and between populations of humans as is

usually the case in anthropological genetics. This difference highlights the fact that pathogen

dynamics are similar in a micro and a macro environment, and lessons from each level can

inform the other. In addition, women in third-world countries who do not enjoy the benefits in

developed countries cannot safely choose to formula feed their infants to minimize risk of

transmission. However, comparatively little research has been conducted on the molecular

characteristics of the virus in milk as compared to other tissues. The recent observations that

exclusive breastfeeding may attenuate the risk of transmission are extremely important in both a

clinical and an anthropological context. The management of breastfeeding practices by the

women themselves in developing countries represents an economical, culturally appropriate, and

non-invasive method to control transmission to their infants, without intervention by medical

staff, drug companies, or governments. The uniquely interdisciplinary approach of anthropology

promotes the importance in providing marginalized women with the tools to manage their

infant’s health, as well as the analytical framework to investigate and understand the molecular

mechanisms underlying the recommendations.

Table 5-1. Number of sequences generated for each tissue.

Tissue Timepoint Number of Clones

Part of initial dataset

PL 1W 23 Yes PL 1W 13a No PL 4M 12 Yes PL 12M 37a Yes PL 12M 22 No PL 24M 28a Yes PL 24M 22 No BML 1W 15 Yes BMR 1W 31 Yes BML 1M 28b No BMR 1M 13b No BML 4M 11 Yes BMR 4M 9 Yes BMR 12M 14 Yes BML 18M 4 Yes

aThese sequences were generated using less starting template in the PCR reaction and were not included in any of the phylogenetic analyses. bThese sequences were included in a subset of analyses. Table 5-2. Sequence characteristics of V1 and V2. V1 V2 Tissue/Timepoint Haplotypes #GlySites Length Haplotypes #GlySites LPL1W A(.03) C(.11) D(.86) 1-4 24-37 A(.06) B(.94) 2-3 4PL4M A(.08) C(.16) D(.75) 2-4 33-37 B(.84) C(.08) E(.08) 2-3 4

PL12M A(.03) B(.05) C(.31) D(.61) 2-4 25-37 A(.05) B(.46) C(.05) D(.41) E(.3) 2-3 4

PL24M A(.04) B(.48) C(.1) D(.22) E(.16) 2-5 22-44 A(.38) B(.32) C(.16) D(.12) 1-3 4BML1W A(1) 1 23 E(1) 3 5BMR1W A(.29) B(.32) C(.39) 1-3 23-36 A(.42) C(.29) E.29) 2-3 4BML1M B(.96) F(.04) 1-4 27-46 A(.25) C(.75) 1-2 4BMR1M A(.08) B(.23) D(.08) F(.54) 1-4 24-46 A(.15) B(.08) C(.77) 1-2 4BML/R4M A(.05) D(.95) 2-4 29-37 B(.8) C(.1) E(.1) 2-3 4BMR12M D(1) 4 37 D(1) 3 4BML18M D(1) 4 37 D(1) 3 4

Haplotypes were assigned based on aa motif, length, and glycosylation sites. The frequency of each haplotype is given in parentheses.

Table 5-3. Combination of V1 and V2 haplotypes. V1 HAP A A A B B B C C D D D D D E FV2 HAP A C E A B C A B A B C D E C CPLA1W 1 4 1 30 PLA4M 1 2 7 1 1 PLA12M 2 3 18 9 3 24 PLA24M 2 15 9 1 4 1 3 1 6 8 BML1W 15 BMR1W 9 1 9 12 BML1M 7 20 1BMR1M 2 3 1 7BML4M 1 9 1 1 BMR4M 7 1 BML12M 14 BMR18M 4

The number of clones displaying each combination of V1 and V2 haplotypes is given for each tissue/timepoint. Haplotypes were defined by length, aa motif, and number of glycosylation sites. Table 5-4. Hudson test for population structure. Comparison Group 1 Group 2 p-value intra-timepoint tissues BML 1W BMR 1W p<0.001intra-timepoint tissues BML 4M BMR 4M p=0.935intra-timepoint tissues BML1W PLA1W p<0.001intra-timepoint tissues BMLR 1W PLA1W p<0.001intra-timepoint tissues BML/R 4M PLA 4M p=0.428intra-timepoint tissues BMR 12M PLA 12M p<0.001inter-timepoint, same tissue PLA 1W PLA 4M p<0.001inter-timepoint, same tissue PLA 1W PLA 12M p<0.001inter-timepoint, same tissue PLA 1W PLA 24M p<0.001inter-timepoint, same tissue PLA 4M PLA 12M p<0.001inter-timepoint, same tissue PLA 4M PLA 24M p<0.001inter-timepoint, same tissue PLA 12M PLA 24M p<0.001inter-timepoint, same tissue BML 1W BML/R 4M p<0.001inter-timepoint, same tissue BML 1W BMR 12M p<0.001inter-timepoint, same tissue BMR 1W BML/R 4M p<0.001inter-timepoint, same tissue BMR 1W BMR 12M p<0.001inter-timepoint, same tissue BML/R 4M BMR 12M p<0.001inter-timepoint, same tissue BMR 12M BML 18M p<0.001

Table 5-5. Number of putative recombinant clones. V1V5 v1v3 v1v2 c2v3 C2V5 (A) C2V5 (B) C2V5 (A) c C2V5 (A)d

Initial p-value p<10

-19p<10

-19p=8.43x10-7 p=0.191 p=0.002 p=0.002 p=0.027 p=0.006

PL 1wk UNDETb UNDET 3 0 2 2 2 3 PL 1wk a UNDET UNDET * * * * * 0 PL 4m UNDET UNDET 1 0 6 6 6 6

PL 12m UNDET UNDET 0 0 4 5 4 6 PL 12m a UNDET UNDET * * * * * 0 PL 24m UNDET UNDET 10 0 0 2 0 5 PL 24m a UNDET UNDET * * * * * 9 BML 1w UNDET UNDET 0 0 2 15 2 2 BMR 1w UNDET UNDET 12 0 14 0 14 14 BMl 4m UNDET UNDET 1 0 2 2 2 2 BMR 4m UNDET UNDET 0 0 2 2 2 2

BMR 12m UNDET UNDET 0 0 0 0 0 0 BML 18m UNDET UNDET 0 0 0 0 0 0 The number of identified clones in each alignment is given for each tissue/timepoint. aThese sequences were generated using 1/10 of the amount of starting template and were only used in this analysis. bFor these alignments, a dataset that did not include recombinants could not be obtained. cThis alignment included the additional breastmilk sequences from month 1 in addition to the non-recombinant C2V5 (A) dataset. dThis alignment included the additional plasma sequences in addition to the non-recombinant C2V5 (A) dataset. Table 5-6. Marginal likelihoods for models used in the Bayesian analysis. Model Outgroup Marg. Lik. SRD6, SC none -4362.9985 SRD6, RC none -4321.3865 GTR2, RC none -4278.6483 GTR3, RC none -4270.3296 SRD6, RC BML 1W (Group 1) -4321.3865 SRD6, RC BMR 1W (Group 2) -4322.9025 SRD6, RC BMR 1W (Group 3) -4322.7647 SRD6, RC PLA 1W (4 seq.) -4323.1764 SRD6, RC PLA 1W (2 seq.) -4322.8887

The definition of each outgroup is given in the text. SRD6 = Shapiro, Rambaut, and Drummond (2006) model. GTR = general time reversible with two or three partitions of the data based on codon position. SC = strict clock assumption. RC = relaxed clock assumption.

Figure 5-1. Sampling times and tissues.

Sampling times are on the top of the graph. W=week, M=month. Tissues which were sampled from each time are on the bottom: PL=plasma, BML=left breastmilk, BMR=right breastmilk.

Three sequences from week 1 are in red. Bootstrap values based on 1,000 neighbor-joining replicates are indicated above each major branch. Branch lengths are in units of substitutions/site.

Figure 5-2. Neighbor-joining phylogeny of all subtypes in group M plus this patient.

A C T N L A N D T A R R I I A K S M E E E V K N C 24 1 27A C T E I R N I T G N N R T I D F S M K G E V Q N C 25 2 2A C T E I R N I T D G G N R T I D F S M R E E I K N C 26 2 2A C T E I R N I T G G G T D N N R T I D F S M K G E V K N C 29 2 1A C T E I Y N S T S G N S T D G G K D N N R N I D L S M Q G E V K N C 34 2 1B C T N L N R T F V N Y T S S M K E E I K N C 22 2 2B C T N L N R T I A N D T S G I S M Q E E I K N C 24 2 3B C T N L N R T I G N D T S D A N G M K E E I K N C 25 2 1B C T N L K N T I V N D T S G S D T S S M K E E I K N C 27 1 7B C T N L N R T I V N D T S G S D T S S M K E E I K N C 27 2 54C C T N L K N T T V N G T S G N G A R T I D S N M K G E V K N C 31 2 1C C T N L T N T T V N G T G S G N D T K R T V D S S M E G E V K N C 33 3 6C C T N L K N T T V N G T S D T S G G N N N N R T I D N S M K E E I K N C 36 3 33D C T N L K N T T V N G T V G N G T G G G R T I D S N M K G E V K N C 34 3 2D C T N L K N T T V N S T S G T S N G N D K K R T I D S S M K G E V K N C 36 2 3D C T N L K N T T V N G T S G T S G G T D N N R T I D F S M K G E V K N C 36 3 1D C T N L K N I T V N G T S G N S T S G G N E N R T I D S S M K G E V K N C 37 4 44D C T N L K N I T V N G I S G N S T G G G T A I N R T I D F S M K E E I K N C 38 3 6D C T N L K N T T V N G T S G N S T G G G N G N N R T I D L S M K G E V K N C 38 4 61D C T N L K N I T V N G T S S N S T G D G N D T N R T I D S S M T E E V K N C 38 5 2D C T N L Q N I T V N G T S S N S T S G N S T G G G T D Y N R T I D F S M K G E V K N C 43 5 3E C E E L N G T I V N D T Y S N D A T F N N N T S G G I K R N K T I V R S M E G E V K N C 44 4 8F C T K P S G T I V N D T A G N G T M N D T A G - N G T M N G G D R K I E I S M E E E V K N C 46 5 8

Figure 5-3. Haplotype analysis of V1.

Haplotypes were assigned based on amino acid (aa) motif, length, and number of glycosylation sites (highlighted in green). Representative sequences for each haplotype are shown. Sequences within a haplotype that differed in aa motif but shared the same length and number of glycosylation sites are not shown.

A C S F N T T T E I H D K Q Q K V H A L F Y R L D I A Q L D K D R N D Y R L I N C 40 1 1A C S F N T T T E I H D K Q Q K V H A L F Y R L D I A Q L D N N N E T Y R L I N C 40 2 20A C S F N T T T E I H D K Q Q K V H A L F Y R L D I A Q L D N D S R T Y R L I N C 40 2 25B C S F K T T T E I R D R E Q K V H A L F Y R L D I Q P L G N E T E E G G T Y R L I N C 43 1 3B C S F N T T T E I N D R K Q K V H A L F Y R L D I Q P L G N E T E E G G T Y R L I N C 43 2 9B C S F N T T T E I R D R E Q K V H A L F Y R L D I Q P L G N E T E E N S T Y R L I N C 43 3 71B C S F N T T T E V N D R K Q K V H A L F Y R L D I Q P L G N E T K E G N G T Y R L I N C 44 3 13B C S F N T T T E I N D R Q Q K V R A L F Y R L D I Q P L G N E T N K E G N G T Y R L I N C 45 3 2B C S F N T T T E I N D R K Q K V H A L F Y R L D I Q P L G N E T T E G N G T Y T Y R L I N C 46 3 4C C S F R A N T E K D R K Q N V T A L F Y R L D I A P L D K G N K T Y R L I N C 40 1 8C C S F N T T T E I H D R Q Q K V H A L F Y R L D I Q Q L D K E R N D T Y R L I N C 41 2 3C C S F N T T T E I Q D K Q Q K V H A L F Y R L D I Q Q L D K E G D D T S E T Y R L I N C 44 1 2C C S F N T T T E I Q D R Q Q K V H A L F Y R L D I Q P L D K E G N D T F E T Y T L I N C 44 2 43D C S F N T T T E I Q D R Q Q K V H A L F Y R L D I Q P L G N K T T K E G N D T F E T Y T L I N C 48 3 48E C S F N T T T E I H D R Q Q K V H A L F Y R L D I Q P L E E G N K G G N D T T E R E N G T Y T L I N C 51 3 29

Haplotypes were assigned based on amino acid (aa) motif, length, and number of glycosylation sites (highlighted in green). Representative sequences for each haplotype are shown. Sequences within a haplotype that differed in aa motif but shared the same length and number of glycosylation sites are not shown.

Figure 5-4. Haplotype analysis of V3.

V V C V3 C3 V C V5

131 157 196 296 331 385 418 460 471

V V C V3 C V C V5

131 157 196 296 331 385 418 460 471

V1VV1V

V1V2 C2V5

Figure 5-5. Recombination alignments.

Red lines represent alignments for which a non-recombinant datatset could not be obtained. Green lines represent alignments for which non-recombinant datasets were obtained. .

Figure 5-6. Network of breast milk sequences from week 1.

Breastmilk sequences from week 1 are categorized into four groups basaed on clustering in the network. Branch lengths are in substitutions/site.

Figure 5-7. Bayesian consensus phylogeny for C2V5 (A).

Branches are color coded according to tissue as follows: blue = right breast, red = left breast, green = plasma. Posterior probabilities are given above each major branch. The timepoint at which sequences were sampled is indicated by a symbol (see legend). The three groups of breastmilk sequences at week 1 are labeled as categorized in Figure 5-6.

Figure 5-8. Bayesian consensus phylogeny for C2V5 (B).

Figure 5-9. Best-rooted maximum likelihood phylogeny.

Branches are color coded according to tissue as follows: blue = right breast, red = left breast, green = plasma. Bootstrap values >50 based on 1,000 replicates are given above each major branch. The timepoint at which sequences were sampled is indicated by a symbol (see legend). The three groups of breastmilk sequences at week 1 are labeled as categorized in Figure 5-6.

Figure 5-10. Bayesian consensus phylogeny for the C2V5 (A) dataset with branches under

significant selection.

Branches are color coded according to tissue as follows: blue = right breast, red = left breast, green = plasma. The timepoint at which sequences were sampled is indicated by a symbol (see legend). The three groups of breastmilk sequences at week 1 are labeled as categorized in Figure 5-6. Monophyletic sequences from the same tissue are collapsed. Thick black branches indicate significant negative selection, and thick red branches indicate significant positive selection.

Figure 5-11. Bayesian consensus phylogeny with breast milk month 1 sequences.

Figure. 5-12. Migration analysis for two tissues.

(a) The migration matrix calculated from the posterior distribution of trees estimated using the C2V5 (A) dataset. Circles are proportional to the minimum (dark blue), average (light blue) and maximum (white) number of events for each category. (b) The migration matrix calculated from 10,000 random trees.

Figure 5-13. Migration analysis for three tissues.

(a) The migration matrix calculated from the posterior distribution of trees estimated using the C2V5 (A) dataset. Circles are proportional to the minimum (dark blue), average (light blue) and maximum (white) number of events for each category. (b) The migration matrix calculated from 10,000 random trees.

CHAPTER 6 CONCLUSION

The impact of disease on the human population is of immediate and practical concern.

Emerging infectious diseases, defined as "infections that have newly appeared in a population or

have existed previously but are rapidly increasing in incidence or geographic range" (Morse

1995) have been significantly increasing over the past 50 years (Jones et al. 2008). Emergence is

largely due to increasing urbanization, wars, environmental degradation, and global inter-

connectivity (Fauci 1998; Stephens et al. 1998a; Desselberger 2000; Pollard and Dobson 2000;

Feldmann et al. 2002; Fauci, Touchette, and Folkers 2005). Tuberculosis, malaria, and HIV-1, as

well as emerging threats such as SARS, West Nile Virus, and influenza, are major threats to

international public health (Fauci 1998; Morens, Folkers, and Fauci 2004; Fauci, Touchette, and

Folkers 2005). Infectious diseases are the second leading cause of death in the pre-industrialized

world (Fauci, Touchette, and Folkers 2005) and account for >25% of all deaths worldwide

(Morse 1995). In addition, complex diseases including cancer, diabetes, and heart disease are

responsible for 87% of deaths in high income countries and 43% of deaths in low income

countries, of which 40-80% of these deaths could be avoided with lifestyle changes (WHO

2008a; WHO 2008b) though myriad studies suggest have these diseases have a genetic

component as well. By 2030, heart disease and HIV/AIDS are predicted to be the largest

components of the burden of human disease (Mathers and Loncar 2006)

Due to the increasing global impact and complicated manner of transmission and

inheritance of infectious and complex diseases, a comprehensive approach is essential to

diagnose, treat and eradicate human diseases. My dissertation demonstrates how genetic

anthropology can be used to address both anthropological and clinical concerns from three

perspectives. As anthropological geneticists, we are uniquely positioned to use the analytical

tools of evolutionary genetics to study the biological mechanisms of diseases, while maintaining

a holistic approach that considers the cultural, historical, and demographic factors which

influence etiology. We can also use our expertise to influence US and international policy by

advocating for culturally sensitive implementation of scientific findings. I believe that lack of

dialogue between fields, even those that use similar molecular and bioinformatics techniques,

can substantially impede progress towards shared goals of treating and eradicating human

disease. However, my study also demonstrates that a conscientious treatment of the clinical and

policy implications does not preclude simultaneously addressing more traditional biological

anthropological questions involving human evolution and migration.

Both anthropological and clinical implications were addressed for the four projects

included in this dissertation. First, with regard to the evolution of treponemal diseases, I found

that the genetic variation present within and between treponemal subspecies was largely the

result of a particular type of homologous recombination called gene conversion. Furthermore, the

largest number of gene conversion events took place in the venereal syphilis genome, which is

largely regarded as the most recently evolved of the three subspecies. These recombinant events

distort the phylogenetic and molecular relationships among the subspecies, and therefore I

conclude that relying on single nucleotide polymorphisms without considering the contextual

sequence information are insufficient to discern evolutionary relationships among the

treponemes. Second, my molecular data suggest that the three subspecies are molecularly distinct

based on high bootstrap values for the phylogenetic branches separating the subspecies, as well

as a significant amount of among-subspecies variation. However, this could be the result of

geographic structure and does not necessarily support a classification of three diseases. Third, the

molecular data do not appear to support a dramatically older origin of yaws relative to venereal

syphilis but instead are consistent with a relatively coincident evolution of the three human

treponemal subspecies. Moreover, the venereal syphilis sequences harbor more variation than

would be expected under the modified Columbian hypothesis of evolution of venereal syphilis

within the past 500 years (Baker and Armelagos 1988). This study was able to compare support

for the leading anthropological hypotheses for the evolution of syphilis, as well as provide new

genomic regions suitable for diagnosing between the three diseases.

In the second project, I investigated the association between genotypic data from the ADH

and ALDH alcohol metabolism genes and both dichotomous and continuous substance abuse

phenotypes in a Plains population of Native Americans. In the third project, I genotyped and

analyzed genotype data from the SNCA gene in ~1000 individuals from the same Plains

population and a second Native American population from the Southwest United States. Despite

the extensive genetic data, I found no correlation between any of the ADH, ALDH, or SNCA

markers and the substance abuse phenotypes. For the SNCA gene, this may suggest that

unassayed promoter polymorphisms are affecting the expression of the gene and risk of

substance abuse rather than variants within the gene itself, since excessive mRNA and protein

levels have been associated with alcohol use disorders. Thus, future studies should investigate

the genetic variation upstream of the SNCA for an association with alcohol use. Alternatively, the

evolutionary history of Native Americans could explain the lack of association between the

assayed genetic markers and substance abuse. Native Americans experienced a severe population

bottleneck during migration from Asia to the New World, in which much of the ancestral genetic

variability was lost (Mulligan et al. 2004; Ramachandran et al. 2005). It was known from

previous research that the ADH and ALDH alleles already identified as protective against

alcoholism in Asians were not present in the tested Native American populations although the

fact that the genes were implicated in the risk of alcoholism motivated my analysis of additional

ADH and ALDH alleles. My research demonstrated that the SNCA upstream polymorphism

previously associated with alcoholism was not present in the tested populations, consistent with a

severe population bottleneck during Native American evolutionary history. Thus, these results

represent a fairly comprehensive investigation of the major candidate genes and alleles for

association with alcoholism in Native Americans and lack of association with such alleles

suggests that a non-genetic approach may represent the best strategy for treatment of alcoholism

in these populations. I believe that additional resources should be dedicated to substance abuse

intervention efforts that target societal and cultural causes, such as poverty, lack of health care

and unemployment.

In the final project, I conducted the first longitudinal study of the evolution of HIV-1 in the

breast milk and blood plasma of a HIV-positive mother over a two year period. The goal of the

study was to elucidate the molecular mechanisms responsible for the observation that exclusive

breastfeeding reduces the risk of transmission over mixed feeding (Coutsoudis et al. 1999;

Coutsoudis 2000; Coutsoudis et al. 2001; Coutsoudis et al. 2002; Iliff et al. 2005; Coovadia et al.

2007), as well as to understand the general evolutionary patterns of the virus in different tissues

within a single patient over time. I determined that the virus in the breastmilk post-partum is

distinct from the virus contemporaneously circulating in the plasma, which suggests a tissue

other than the plasma is seeding the milk infection. Because the initial milk virus may have

different pathogenicity and/or transmissibility than the plasma virus, future studies should

investigate the phenotypic characteristics of the early milk virus. Intriguingly, the population

dynamics of the infection in this patient clearly changes by month 4, coinciding with the

transmission of the virus to the infant in this case. The mother also reported ceasing exclusive

breastfeeding shortly thereafter. However, the exact timing of these events is unclear and

therefore conclusions about causality between changes in the breastmilk virus population and

weaning with HIV transmission to the baby are uncertain. This study also demonstrates

conclusively that the milk virus is compartmentalized throughout much of its production, and

that the evolution of the virus in the milk is a dynamic process. Understanding the complexity of

the evolutionary process is of great importance in determining the relationship between feeding

practices and transmission on a molecular level, so that women can be empowered to effectively

manage their breastfeeding practices to minimize the risk of transmission. Thus, the uniquely

interdisciplinary approach of anthropology promotes the importance in providing marginalized

women with the tools to manage their infant’s health, as well as the analytical framework to

investigate and understand the molecular mechanisms underlying the recommendations.

My dissertation also has broader implications for the field of genetic anthropology. First, I

use a model that incorporates three distinct perspectives from which human disease can be

approached. Each perspective is temporally and philosophically distinct and includes different

levels of human variation. While anthropologists traditionally employ the evolutionary or

population perspective, we have the tools and the expertise to inform studies from the clinical

perspective as well. Second, I have demonstrated that incorporating studies of pathogen genetics

is valuable in understanding the interaction between humans and disease. Pathogens can be

studied from a clinical perspective, such as HIV-1 in an individual, as well as an evolutionary

perspective, such as the movement of treponemes across continents. Pathogen evolutionary

dynamics appear to be similar across temporally disparate perspectives, for example, the high

degree of gene conversion/recombination that appears to be characteristic of both T. pallidum

and HIV-1. Uncovering uniform processes recurring in the evolution of infectious diseases will

allow us to better understand human-pathogen interactions, and may aid in developing strategies

to eradicate infectious diseases. Lastly, because both infectious and complex diseases are

increasing concerns to global health, genetic anthropologists should widen their focus to address

clinical and policy implications implicit in their studies that may primarily address more

evolutionary questions. Genetic anthropologists are equipped with the analytical tools to study

the biological mechanisms of diseases and incorporate information about the underlying

population structure and evolutionary history. This unique perspective allows genetic

anthropologists to provide comprehensive clinical and policy recommendations based on a

comprehensive evaluation of genetic data. Finally, the multi-discinplinary approach employed by

anthropologists can be valuable in ensuring that resulting applications of the data are culturally

appropriate and provide maximum health benefits to communities in need.

APPENDIX A. LIST OF QUESTIONS FOR SUBSTANCE ABUSE CATEGORIZATION

1. Was there ever a time when, because of your drinking, you often missed work, had trouble on

the job, or were unable to take care of household responsibilities?

2. Did you ever lose a job because of your drinking?

3. Did you often have difficulties with your family, friends, or acquaintances because of your

drinking?

4. Was there ever a period in your life when you drank too much?

5. Has anyone in your family - or anyone else- ever objected to your drinking?

6. Was there ever a time when you often couldn't stop drinking when you wanted to?

7. Have you ever had traffic difficulties because of your drinking - like reckless driving,

accidents, or speeding?

8. Have you often had tremors that were most likely due to drinking? (i.e. when you cut down or

stopped or cut down? After not drinking for a few hours or more, did you often drink to keep

yourself from getting the shakes or becoming sick?)

9. Was there ever a time when you frequently had a drink before breakfast?

10. Were you ever divorced or separated primarily because of your drinking?

11. Have you ever gone on a bender? (Definition: drinking steadily for 3 or more days, more

than a fifth of whiskey daily/24 bottles of beer/3 bottles of wine. Must have occurred 3 or more

times.)

12. Have you ever been physically violent while drinking? (Must have occurred on at least 2

occasions.)

13. Have you ever been picked up by the police because of how you were acting while you were

drinking (e.g. disturbing the peace, fighting, public intoxication. Do not include traffic

difficulties.)

14. Have you ever had blackouts? (Definition: memory loss for events that occurred while

conscious during a drinking episode.)

15. Have you ever had the DT's? (Definition: confused state following stopping drinking that

includes disorientation and illusions or hallucinations.)

16. Did you ever hear voices or see things that weren't really there, soon after you stopped

drinking (Hallucinations - must have occurred on at least two separate occasions)

17. Have you ever had a seizure or fit (non-epileptic) after you stopped drinking?

18. Did a doctor ever tell you that you had developed a physical complication of alcoholism, like

gastritis, pancreatitis, cirrhosis, or neuritis? (Include good evidence of Korsakoff's Syndrome -

chronic brain syndrome with anterograde amnesia as the predominant feature.

LIST OF REFERENCES

Abbate, I., G. Cappiello, R. Longo, A. Ursitti, A. Spano, S. Calcaterra, F. Dianzani, A. Antinori,

and M. R. Capobianchi. 2005. Cell membrane proteins and quasispecies compartmentalization of CSF and plasma HIV-1 from aids patients with neurological disorders. Infect Genet Evol 5:247-253.

Achaz, G., S. Palmer, M. Kearney, F. Maldarelli, J. W. Mellors, J. M. Coffin, and J. Wakeley. 2004. A robust measure of HIV-1 population turnover within chronically infected individuals. Mol Biol Evol 21:1902-1912.

Allen, S. J., A. O'Donnell, N. D. Alexander, M. P. Alpers, T. E. Peto, J. B. Clegg, and D. J. Weatherall. 1997. alpha+-Thalassemia protects children against disease caused by other infections as well as malaria. Proc Natl Acad Sci U S A 94:14736-14741.

Amos, A. F., D. J. McCarty, and P. Zimmet. 1997. The rising global burden of diabetes and its complications: estimates and projections to the year 2010. Diabet Med 14 Suppl 5:S1-85.

Anderson, D. G., and J. C. Gillam. 2000. Paleoindian colonization of the Americas: implications from an examination of physiography, demography, and artifact distribution. Am Antiquity 65:43-66.

Armelagos, G. J., and J. Dewey. 1975. Evolutionary response to human infectious disease. Bioscience 20:271-275.

Armelagos, G. J., K. N. Harper, and P. S. Ocampo. 2005. On the Trail of the Twisted Treponeme: Searching for the Origins of Syphilis. Evolutionary Anthropology: Issues, News, and Reviews 14:240-242.

Arthos, J., C. Cicala, E. Martinelli, K. Macleod, D. Van Ryk, D. Wei, Z. Xiao, T. D. Veenstra, T. P. Conrad, R. A. Lempicki, S. McLaughlin, M. Pascuccio, R. Gopaul, J. McNally, C. C. Cruz, N. Censoplano, E. Chung, K. N. Reitano, S. Kottilil, D. J. Goode, and A. S. Fauci. 2008. HIV-1 envelope protein binds to and signals through integrin alpha(4)beta(7), the gut mucosal homing receptor for peripheral T cells. Nat Immunol.

Arthur, P. G., M. Smith, and P. E. Hartmann. 1989. Milk lactose, citrate, and glucose as markers of lactogenesis in normal and diabetic women. J Pediatr Gastroenterol Nutr 9:488-496.

Baker, B. J., and G. J. Armelagos. 1988. The origin and antiquity of syphilis: paleopathological diagnosis and interpretation. Curr Anthropol 29:703-738.

Baldo, L., S. Bordenstein, J. J. Wernegreen, and J. H. Werren. 2006. Widespread recombination throughout Wolbachia genomes. Mol Biol Evol 23:437-449.

Barker, D. J., C. N. Hales, C. H. Fall, C. Osmond, K. Phipps, and P. M. Clark. 1993. Type 2 (non-insulin-dependent) diabetes mellitus, hypertension and hyperlipidaemia (syndrome X): relation to reduced fetal growth. Diabetologia 36:62-67.

Barrett, J. C., B. Fry, J. Maller, and M. J. Daly. 2005. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263-265.

Barrett, R., C. W. Kuzawa, T. McDade, and G. J. Armelagos. 1998. Emerging and re-emerging infectious diseases: The third epidemiological transition. Annual Review of Anthropology 27:247-271.

Becquart, P., N. Chomont, P. Roques, A. Ayouba, M. D. Kazatchkine, L. Belec, and H. Hocini. 2002. Compartmentalization of HIV-1 between breast milk and blood of HIV-infected mothers. Virology 300:109-117.

Becquart, P., H. Hocini, B. Garin, A. Sepou, M. D. Kazatchkine, and L. Belec. 1999. Compartmentalization of the IgG immune response to HIV-1 in breast milk. Aids 13:1323-1331.

Becquart, P., H. Hocini, M. Levy, A. Sepou, M. D. Kazatchkine, and L. Belec. 2000. Secretory anti-human immunodeficiency virus (HIV) antibodies in colostrum and breast milk are not a major determinant of the protection of early postnatal transmission of HIV. J Infect Dis 181:532-539.

Belfer, I., H. Hipp, C. McKnight, C. Evans, B. Buzas, A. Bollettino, B. Albaugh, M. Virkkunen, Q. Yuan, M. Max, D. Goldman, and M. Enoch. 2006. Association of galanin haplotypes with alcoholism and anxiety in two ethnically distinct populations. Molecular Psychiatry 11:301-311.

Benyshek, D. C., and J. T. Watson. 2006. Exploring the thrifty genotype's food-shortage assumptions: a cross-cultural comparison of ethnographic accounts of food security among foraging and agricultural societies. Am J Phys Anthropol 131:120-126.

Berger, E. A., R. W. Doms, E. M. Fenyo, B. T. Korber, D. R. Littman, J. P. Moore, Q. J. Sattentau, H. Schuitemaker, J. Sodroski, and R. A. Weiss. 1998. A new classification for HIV-1. Nature 391:240.

Bertotto, A., G. Castellucci, G. Fabietti, F. Scalise, and R. Vaccaro. 1990a. Lymphocytes bearing the T cell receptor gamma delta in human breast milk. Arch Dis Child 65:1274-1275.

Bertotto, A., R. Gerli, G. Fabietti, S. Crupi, C. Arcangeli, F. Scalise, and R. Vaccaro. 1990b. Human breast milk T lymphocytes display the phenotype and functional characteristics of memory T cells. Eur J Immunol 20:1877-1880.

Bibbins-Domingo, K., and A. Fernandez. 2007. BiDil for heart failure in black patients: implications of the U.S. Food and Drug Administration approval. Ann Intern Med 146:52-56.

Bjorndal, A., H. Deng, M. Jansson, J. R. Fiore, C. Colognesi, A. Karlsson, J. Albert, G. Scarlatti, D. R. Littman, and E. M. Fenyo. 1997. Coreceptor usage of primary human immunodeficiency virus type 1 isolates varies according to biological phenotype. J Virol 71:7478-7487.

Blanco-Gelaz, M. A., A. Lopez-Vazquez, S. Garcia-Fernandez, J. Martinez-Borra, S. Gonzalez, and C. Lopez-Larrea. 2001. Genetic variability, molecular evolution, and geographic diversity of HLA-B27. Hum Immunol 62:1042-1050.

Bonsch, D., V. Greifenberg, K. Bayerlein, T. Biermann, U. Reulbach, T. Hillemacher, J. Kornhuber, and S. Bleich. 2005a. Alpha-synuclein protein levels are increased in alcoholic patients and are linked to craving. Alcohol Clin Exp Res 29:763-765.

Bonsch, D., T. Lederer, U. Reulbach, T. Hothorn, J. Kornhuber, and S. Bleich. 2005b. Joint analysis of the NACP-REP1 marker within the alpha synuclein gene concludes association with alcohol dependence. Hum Mol Genet 14:967-971.

Bonsch, D., B. Lenz, J. Kornhuber, and S. Bleich. 2005c. DNA hypermethylation of the alpha synuclein promoter in patients with alcoholism. Neuroreport 16:167-170.

Bonsch, D., U. Reulbach, K. Bayerlein, T. Hillemacher, J. Kornhuber, and S. Bleich. 2004. Elevated alpha synuclein mRNA levels are associated with craving in patients with alcoholism. Biol Psychiatry 56:984-986.

Borras, E., C. Coutelle, A. Rosell, F. Fernandez-Muixi, M. Broch, B. Crosas, L. Hjelmqvist, A. Lorenzo, C. Gutierrez, M. Santos, M. Szczepanek, M. Heilig, P. Quattrocchi, J. Farres, F. Vidal, C. Richart, T. Mach, J. Bogdal, H. Jornvall, H. K. Seitz, P. Couzigou, and X.

Pares. 2000. Genetic polymorphism of alcohol dehydrogenase in europeans: the ADH2*2 allele decreases the risk for alcoholism and is associated with ADH3*1. Hepatology 31:984-989.

Brass, A. L., D. M. Dykxhoorn, Y. Benita, N. Yan, A. Engelman, R. J. Xavier, J. Lieberman, and S. J. Elledge. 2008. Identification of host proteins required for HIV infection through a functional genomic screen. Science 319:921-926.

Briggs, D. R., D. L. Tuttle, J. W. Sleasman, and M. M. Goodenow. 2000. Envelope V3 amino acid sequence predicts HIV-1 phenotype (co-receptor usage and tropism for macrophages). Aids 14:2937-2939.

Broder, C. C., and R. G. Collman. 1997. Chemokine receptors and HIV. J Leukoc Biol 62:20-29. Bruen, T. C., H. Philippe, and D. Bryant. 2006. A simple and robust statistical test for detecting

the presence of recombination. Genetics 172:2665-2681. Burkala, E. J., J. He, J. T. West, C. Wood, and C. K. Petito. 2005. Compartmentalization of HIV-

1 in the central nervous system: role of the choroid plexus. Aids 19:675-684. Cann, R. L., M. Stoneking, and A. C. Wilson. 1987. Mitochondrial DNA and human evolution.

Nature 325:31-36. Cao, K., A. M. Moormann, K. E. Lyke, C. Masaberg, O. P. Sumba, O. K. Doumbo, D. Koech, A.

Lancaster, M. Nelson, D. Meyer, R. Single, R. J. Hartzman, C. V. Plowe, J. Kazura, D. L. Mann, M. B. Sztein, G. Thomson, and M. A. Fernandez-Vina. 2004. Differentiation between African populations is evidenced by the diversity of alleles and haplotypes of HLA class I loci. Tissue Antigens 63:293-325.

Caramelli, D., C. Lalueza-Fox, S. Condemi, L. Longo, L. Milani, A. Manfredini, M. de Saint Pierre, F. Adoni, M. Lari, P. Giunti, S. Ricci, A. Casoli, F. Calafell, F. Mallegni, J. Bertranpetit, R. Stanyon, G. Bertorelle, and G. Barbujani. 2006. A highly divergent mtDNA sequence in a Neandertal individual from Italy. Curr Biol 16:R630-632.

Carmody, M. S., and J. R. Anderson. 2007. BiDil (isosorbide dinitrate and hydralazine): a new fixed-dose combination of two older medications for the treatment of heart failure in black patients. Cardiol Rev 15:46-53.

Carrillo, A., and L. Ratner. 1996. Cooperative effects of the human immunodeficiency virus type 1 envelope variable loops V1 and V3 in mediating infectivity for T cells. J Virol 70:1310-1316.

Cavalli-Sforza, L. L., P. Menozzi, and A. Piazza. 1994. The History and Geography of Human Genes. Princeton University Press, Princeton.

CDC. 2006a. Health, United States, 2006: with Chartbook on Trends in the Health of Americans. CDC. 2006b. Mother to Child (Perinatal) HIV Transmission and Infection. Department of Health

and Human Services. CDC. 2007. What Women Can Do. Department of Health and Human Services. Centurion-Lara, A., C. Castro, L. Barrett, C. Cameron, M. Mostowfi, W. C. Van Voorhis, and S.

A. Lukehart. 1999. Treponema pallidum major sheath protein homologue TprK is a target of opsonic antibody and the protective immune response. J Exp Med 189:647-656.

Centurion-Lara, A., C. Godornes, C. Castro, W. C. Van Voorhis, and S. A. Lukehart. 2000a. The tprK gene is heterogeneous among Treponema pallidum strains and has multiple alleles. Infect Immun 68:824-831.

Centurion-Lara, A., R. E. LaFond, K. Hevner, C. Godornes, B. J. Molini, W. C. Van Voorhis, and S. A. Lukehart. 2004. Gene conversion: a mechanism for generation of heterogeneity in the tprK gene of Treponema pallidum during infection. Mol Microbiol 52:1579-1596.

Centurion-Lara, A., B. J. Molini, C. Godornes, E. Sun, K. Hevner, W. C. Van Voorhis, and S. A. Lukehart. 2006. Molecular differentiation of Treponema pallidum subspecies. J Clin Microbiol 44:3377-3380.

Centurion-Lara, A., E. S. Sun, L. K. Barrett, C. Castro, S. A. Lukehart, and W. C. Van Voorhis. 2000b. Multiple alleles of Treponema pallidum repeat gene D in Treponema pallidum isolates. J Bacteriol 182:2332-2335.

Chao, Y. C., M. F. Wang, H. S. Tang, C. T. Hsu, and S. J. Yin. 1994. Genotyping of alcohol dehydrogenase at the ADH2 and ADH3 loci by using a polymerase chain reaction and restriction-fragment-length polymorphism in Chinese alcoholic cirrhotics and non-alcoholics. Proc Natl Sci Counc Repub China B 18:101-106.

Chen, C. C., R. B. Lu, Y. C. Chen, M. F. Wang, Y. C. Chang, T. K. Li, and S. J. Yin. 1999. Interaction between the functional polymorphisms of the alcohol-metabolism genes in protection against alcoholism. Am J Hum Genet 65:795-807.

Chen, W. J., E. W. Loh, Y. P. Hsu, C. C. Chen, J. M. Yu, and A. T. Cheng. 1996. Alcohol-metabolising genes and alcoholism among Taiwanese Han men: independent effect of ADH2, ADH3 and ALDH2. Br J Psychiatry 168:762-767.

Chen, X., H. A. de Silva, M. J. Pettenati, P. N. Rao, P. St George-Hyslop, A. D. Roses, Y. Xia, K. Horsburgh, K. Ueda, and T. Saitoh. 1995. The human NACP/alpha-synuclein gene: chromosome assignment to 4q21.3-q22 and TaqI RFLP analysis. Genomics 26:425-427.

Chiba-Falek, O., J. W. Touchman, and R. L. Nussbaum. 2003. Functional analysis of intra-allelic variation at NACP-Rep1 in the alpha-synuclein gene. Hum Genet 113:426-431.

Chisenga, M., L. Kasonka, M. Makasa, M. Sinkala, C. Chintu, C. Kaseba, F. Kasolo, A. Tomkins, S. Murray, and S. Filteau. 2005. Factors affecting the duration of exclusive breastfeeding among HIV-infected and -uninfected women in Lusaka, Zambia. J Hum Lact 21:266-275.

Clarimon, J., R. R. Gray, L. N. Williams, M. A. Enoch, R. W. Robin, B. Albaugh, A. Singleton, D. Goldman, and C. J. Mulligan. 2007. Linkage disequilibrium and association analysis of alpha-synuclein and alcohol and drug dependence in two American Indian populations. Alcohol Clin Exp Res 31:546-554.

Cockburn, J. 1971. Infectious disease in ancient populations. Current Anthropology 12:45-62. Cohen, M. N., and G. J. Armelagos. 1984. Paleopathology at the Origins of Agriculture.

Academic Press, Orlando. Connor, R. I., K. E. Sheridan, D. Ceradini, S. Choe, and N. R. Landau. 1997. Change in

coreceptor use coreceptor use correlates with disease progression in HIV-1--infected individuals. J Exp Med 185:621-628.

Coovadia, H. M., N. C. Rollins, R. M. Bland, K. Little, A. Coutsoudis, M. L. Bennish, and M. L. Newell. 2007. Mother-to-child transmission of HIV-1 infection during exclusive breastfeeding in the first 6 months of life: an intervention cohort study. Lancet 369:1107-1116.

Coutsoudis, A. 2000. Influence of infant feeding patterns on early mother-to-child transmission of HIV-1 in Durban, South Africa. Ann N Y Acad Sci 918:136-144.

Coutsoudis, A., F. Dabis, W. Fawzi, P. Gaillard, G. Haverkamp, D. R. Harris, J. B. Jackson, V. Leroy, N. Meda, P. Msellati, M. L. Newell, R. Nsuati, J. S. Read, and S. Wiktor. 2004. Late postnatal transmission of HIV-1 in breast-fed children: an individual patient data meta-analysis. J Infect Dis 189:2154-2166.

Coutsoudis, A., L. Kuhn, K. Pillay, and H. M. Coovadia. 2002. Exclusive breast-feeding and HIV transmission. Aids 16:498-499.

Coutsoudis, A., K. Pillay, L. Kuhn, E. Spooner, W. Y. Tsai, and H. M. Coovadia. 2001. Method of feeding and transmission of HIV-1 from mothers to children by 15 months of age: prospective cohort study from Durban, South Africa. Aids 15:379-387.

Coutsoudis, A., K. Pillay, E. Spooner, L. Kuhn, and H. M. Coovadia. 1999. Influence of infant-feeding patterns on early mother-to-child transmission of HIV-1 in Durban, South Africa: a prospective cohort study. South African Vitamin A Study Group. Lancet 354:471-476.

Crago, S. S., S. J. Prince, T. G. Pretlow, J. R. McGhee, and J. Mestecky. 1979. Human colostral cells. I. Separation and characterization. Clin Exp Immunol 38:585-597.

Crosby, A. 1969. The Early History of Syphilis: A Reappraisal. American Anthropologist 71:218-227.

Dabis, F., P. Msellati, N. Meda, C. Welffens-Ekra, B. You, O. Manigart, V. Leroy, A. Simonon, M. Cartoux, P. Combe, A. Ouangre, R. Ramon, O. Ky-Zerbo, C. Montcho, R. Salamon, C. Rouzioux, P. Van de Perre, and L. Mandelbrot. 1999. 6-month efficacy, tolerance, and acceptability of a short regimen of oral zidovudine to reduce vertical transmission of HIV in breastfed children in Cote d'Ivoire and Burkina Faso: a double-blind placebo-controlled multicentre trial. DITRAME Study Group. DIminution de la Transmission Mere-Enfant. Lancet 353:786-792.

De Pasquale, M. P., A. J. Leigh Brown, S. C. Uvin, J. Allega-Ingersoll, A. M. Caliendo, L. Sutton, S. Donahue, and R. T. D'Aquila. 2003. Differences in HIV-1 pol sequences from female genital tract and blood during antiretroviral therapy. J Acquir Immune Defic Syndr 34:37-44.

Desselberger, U. 2000. Emerging and re-emerging infectious diseases. J Infect 40:3-15. Diaz, G. A., B. D. Gelb, N. Risch, T. G. Nygaard, A. Frisch, I. J. Cohen, C. S. Miranda, O.

Amaral, I. Maire, L. Poenaru, C. Caillaud, M. Weizberg, P. Mistry, and R. J. Desnick. 2000. Gaucher disease: the origins of the Ashkenazi Jewish N370S and 84GG acid beta-glucosidase mutations. Am J Hum Genet 66:1821-1832.

Dirks, R. 1993. Starvation and Famine: cross-cultural codes and some hypothesis tests. Cross Cult Res 27:28-69.

Dixon, E. J. 2001. Human colonization of the Americas: timing, technology and process. Quat Sci Rev 20:277-299.

Doeblin, T. D., K. Evans, and G. B. Ingall. 1969. Diabetes and hyperglycemia in Seneca Indians. Human Hered 19:613-627.

Donaldson, Y. K., J. E. Bell, J. W. Ironside, R. P. Brettle, J. R. Robertson, A. Busuttil, and P. Simmonds. 1994. Redistribution of HIV outside the lymphoid system with onset of AIDS. Lancet 343:383-385.

Douek, D. 2007a. HIV Disease Progression. Top HIV Med 15:114-117. Douek, D. 2007b. HIV disease progression: immune activation, microbes, and a leaky gut. Top

HIV Med 15:114-117. Dover, G. 2002. Molecular Drive. Trends Genet 18:587-589. Drouin, G., F. Prat, M. Ell, and G. D. Clarke. 1999. Detecting and characterizing gene

conversions between multigene family members. Mol Biol Evol 16:1369-1390. Drummond, A. J., S. Y. Ho, M. J. Phillips, and A. Rambaut. 2006. Relaxed phylogenetics and

dating with confidence. PLoS Biol 4:e88.

Drummond, A. J., and A. Rambaut. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7:214.

Drummond, A. J., A. Rambaut, B. Shapiro, and O. G. Pybus. 2005. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol 22:1185-1192.

Dudbridge, F. 2003. Pedigree disequilibrium tests for multilocus haplotypes. Genet Epidemiol 25:115-121.

Duster, T. 2007. Medicalisation of race. Lancet 369:702-704. Ehlers, C. L., C. Garcia-Andrade, T. L. Wall, D. F. Sobel, and E. Phillips. 1998. Determinants of

P3 amplitude and response to alcohol in Native American Mission Indians. Neuropsychopharmacology 18:282-292.

Ehlers, C. L., D. A. Gilder, L. Harris, and L. Carr. 2001. Association of the ADH2*3 allele with a negative family history of alcoholism in African American young adults. Alcohol Clin Exp Res 25:1773-1777.

Ehlers, C. L., D. A. Gilder, T. L. Wall, E. Phillips, H. Feiler, and K. C. Wilhelmsen. 2004a. Genomic screen for loci associated with alcohol dependence in Mission Indians. Am J Med Genet B Neuropsychiatr Genet 129:110-115.

Ehlers, C. L., T. L. Wall, M. Betancourt, and D. A. Gilder. 2004b. The clinical course of alcoholism in 243 Mission Indians. Am J Psychiatry 161:1204-1210.

Embree, J. E., S. Njenga, P. Datta, N. J. Nagelkerke, J. O. Ndinya-Achola, Z. Mohammed, S. Ramdahin, J. J. Bwayo, and F. A. Plummer. 2000. Risk factors for postnatal mother-child transmission of HIV-1. Aids 14:2535-2541.

Endicott, J., and R. L. Spitzer. 1978. A diagnostic interview: the schedule for affective disorders and schizophrenia. Arch Gen Psychiatry 35:837-844.

Excoffier, L., G. Laval, and S. Schneider. 2005. Arlequin ver. 3.0: An integrated software package for population genetics data analysis. Evolutionary Bioinformatics Online 1:47-50.

Eyre-Walker, A. 1993. Recombination and mammalian genome evolution. Proc Biol Sci 252:237-243.

Fauci, A. S. 1998. New and reemerging diseases: the importance of biomedical research. Emerg Infect Dis 4:374-378.

Fauci, A. S., N. A. Touchette, and G. K. Folkers. 2005. Emerging infectious diseases: a 10-year perspective from the National Institute of Allergy and Infectious Diseases. Emerg Infect Dis 11:519-525.

Fawzi, W., G. Msamanga, D. Spiegelman, B. Renjifo, H. Bang, S. Kapiga, J. Coley, E. Hertzmark, M. Essex, and D. Hunter. 2002a. Transmission of HIV-1 through breastfeeding among women in Dar es Salaam, Tanzania. J Acquir Immune Defic Syndr 31:331-338.

Fawzi, W. W., G. I. Msamanga, D. Hunter, B. Renjifo, G. Antelman, H. Bang, K. Manji, S. Kapiga, D. Mwakagile, M. Essex, and D. Spiegelman. 2002b. Randomized trial of vitamin supplements in relation to transmission of HIV-1 through breastfeeding and early child mortality. Aids 16:1935-1944.

Feavers, I. M., A. B. Heath, J. A. Bygraves, and M. C. Maiden. 1992. Role of horizontal genetic exchange in the antigenic variation of the class 1 outer membrane protein of Neisseria meningitidis. Mol Microbiol 6:489-495.

Fee, M. 2006. Racializing narratives: Obesity, diabetes, and the "aboriginal" thrifty genotype. Social Science and Medicine 62:2988-2997.

Feil, E. J., E. C. Holmes, D. E. Bessen, M. S. Chan, N. P. Day, M. C. Enright, R. Goldstein, D. W. Hood, A. Kalia, C. E. Moore, J. Zhou, and B. G. Spratt. 2001. Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc Natl Acad Sci U S A 98:182-187.

Feil, E. J., M. C. Maiden, M. Achtman, and B. G. Spratt. 1999. The relative contributions of recombination and mutation to the divergence of clones of Neisseria meningitidis. Mol Biol Evol 16:1496-1502.

Feil, E. J., and B. G. Spratt. 2001. Recombination and the population structures of bacterial pathogens. Annu Rev Microbiol 55:561-590.

Feldmann, H., M. Czub, S. Jones, D. Dick, M. Garbutt, A. Grolla, and H. Artsob. 2002. Emerging and re-emerging infectious diseases. Med Microbiol Immunol 191:63-74.

Fellay, J., K. V. Shianna, D. Ge, S. Colombo, B. Ledergerber, M. Weale, K. Zhang, C. Gumbs, A. Castagna, A. Cossarizza, A. Cozzi-Lepri, A. De Luca, P. Easterbrook, P. Francioli, S. Mallal, J. Martinez-Picado, J. M. Miro, N. Obel, J. P. Smith, J. Wyniger, P. Descombes, S. E. Antonarakis, N. L. Letvin, A. J. McMichael, B. F. Haynes, A. Telenti, and D. B. Goldstein. 2007. A whole-genome association study of major determinants for host control of HIV-1. Science 317:944-947.

Fix, A. G. 2002. Colonization models and initial genetic diversity in the Americas. Hum Biol 74:1-10.

Fraser, C. M., S. J. Norris, G. M. Weinstock, O. White, G. G. Sutton, R. Dodson, M. Gwinn, E. K. Hickey, R. Clayton, K. A. Ketchum, E. Sodergren, J. M. Hardham, M. P. McLeod, S. Salzberg, J. Peterson, H. Khalak, D. Richardson, J. K. Howell, M. Chidambaram, T. Utterback, L. McDonald, P. Artiach, C. Bowman, M. D. Cotton, C. Fujii, S. Garland, B. Hatch, K. Horst, K. Roberts, M. Sandusky, J. Weidman, H. O. Smith, and J. C. Venter. 1998. Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science 281:375-388.

Fribourg-Blanc, A., H. H. Mollaret, and G. Niel. 1966. [Serologic and microscopic confirmation of treponemosis in Guinea baboons]. Bull Soc Pathol Exot Filiales 59:54-59.

Fuchs, A. R. 1991. Physiology and endocrinology of lactation. in S. G. Gabbe, ed. Obstetrics - Normal and Problem Pregnancies. Churchill Livingstone, New York.

Fujimoto, W. Y. 1996. Overview of non-insulin-dependent diabetes mellitus (NIDDM) in different population groups. Diabet Med 13:S7-10.

Gabriel, S. B., S. F. Schaffner, H. Nguyen, J. M. Moore, J. Roy, B. Blumenstiel, J. Higgins, M. DeFelice, A. Lochner, M. Faggart, S. N. Liu-Cordero, C. Rotimi, A. Adeyemo, R. Cooper, R. Ward, E. S. Lander, M. J. Daly, and D. Altshuler. 2002. The structure of haplotype blocks in the human genome. Science 296:2225-2229.

Galtier, N. 2003. Gene conversion drives GC content evolution in mammalian histones. Trends Genet 19:65-68.

Galtier, N., G. Piganeau, D. Mouchiroud, and L. Duret. 2001. GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics 159:907-911.

Galvani, A. P., and J. Novembre. 2005. The evolutionary history of the CCR5-Delta32 HIV-resistance mutation. Microbes Infect 7:302-309.

Galvani, A. P., and M. Slatkin. 2003. Evaluating plague and smallpox as historical selective pressures for the CCR5-Delta 32 HIV-resistance allele. Proc Natl Acad Sci U S A 100:15276-15279.

Gao, F., E. Bailes, D. L. Robertson, Y. Chen, C. M. Rodenburg, S. F. Michael, L. B. Cummins, L. O. Arthur, M. Peeters, G. M. Shaw, P. M. Sharp, and B. H. Hahn. 1999. Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes. Nature 397:436-441.

Garzino-Demo, A., A. L. DeVico, K. E. Conant, and R. C. Gallo. 2000. The role of chemokines in human immunodeficiency virus infection. Immunol Rev 177:79-87.

Gatanaga, H., S. Oka, S. Ida, T. Wakabayashi, T. Shioda, and A. Iwamoto. 1999. Active HIV-1 redistribution and replication in the brain with HIV encephalitis. Arch Virol 144:29-43.

Georgeson, J. C., and S. M. Filteau. 2000. Physiology, immunology, and disease transmission in human breast milk. AIDS Patient Care STDS 14:533-539.

Giacani, L., K. Hevner, and A. Centurion-Lara. 2005. Gene organization and transcriptional analysis of the tprJ, tprI, tprG, and tprF loci in Treponema pallidum strains Nichols and Sea 81-4. J Bacteriol 187:6084-6093.

Giacani, L., E. S. Sun, K. Hevner, B. J. Molini, W. C. Van Voorhis, S. A. Lukehart, and A. Centurion-Lara. 2004. Tpr homologs in Treponema paraluiscuniculi Cuniculi A strain. Infect Immun 72:6561-6576.

Gilder, D. A., T. L. Wall, and C. L. Ehlers. 2004. Comorbidity of select anxiety and affective disorders with alcohol dependence in southwest California Indians. Alcohol Clin Exp Res 28:1805-1813.

Goedde, H. W., D. P. Agarwal, G. Fritze, D. Meier-Tackmann, S. Singh, G. Beckmann, K. Bhatia, L. Z. Chen, B. Fang, R. Lisker, and et al. 1992. Distribution of ADH2 and ALDH2 genotypes in different populations. Hum Genet 88:344-346.

Gogarten, J. P., and L. Olendzenski. 1999. Orthologs, paralogs and genome comparisons. Curr Opin Genet Dev 9:630-636.

Goldman, A. S. 1993. The immune system of human milk: antimicrobial, antiinflammatory and immunomodulating properties. Pediatr Infect Dis J 12:664-671.

Goldman, A. S., S. Chheda, and R. Garofalo. 1998. Evolution of immunologic functions of the mammary gland and the postnatal development of immunity. Pediatr Res 43:155-162.

Goodenow, M. M., S. L. Rose, D. L. Tuttle, and J. W. Sleasman. 2003. HIV-1 fitness and macrophages. J Leukoc Biol 74:657-666.

Gray, R. R., C. J. Mulligan, B. J. Molini, E. S. Sun, L. Giacani, C. Godornes, A. Kitchen, S. A. Lukehart, and A. Centurion-Lara. 2006. Molecular evolution of the tprC, D, I, K, G, and J genes in the pathogenic genus Treponema. Mol Biol Evol 23:2220-2233.

Grenfell, B. T., O. G. Pybus, J. R. Gog, J. L. Wood, J. M. Daly, J. A. Mumford, and E. C. Holmes. 2004. Unifying the epidemiological and evolutionary dynamics of pathogens. Science 303:327-332.

Gurtler, L. G., P. H. Hauser, J. Eberle, A. von Brunn, S. Knapp, L. Zekeng, J. M. Tsague, and L. Kaptue. 1994. A new subtype of human immunodeficiency virus type 1 (MVP-5180) from Cameroon. J Virol 68:1581-1585.

Haase, A. T., K. Henry, M. Zupancic, G. Sedgewick, R. A. Faust, H. Melroe, W. Cavert, K. Gebhard, K. Staskus, Z. Q. Zhang, P. J. Dailey, H. H. Balfour, Jr., A. Erice, and A. S. Perelson. 1996. Quantitative image analysis of HIV-1 infection in lymphoid tissue. Science 274:985-989.

Hackett, C. J. 1963. On the origin of the human treponematoses (pinta, yaws, endemic syphilis and venereal syphilis). . Bulletin of the World Health Organization 29:7-41.

Hales, C. N., and D. J. Barker. 1992. Type 2 (non-insulin-dependent) diabetes mellitus: the thrifty phenotype hypothesis. Diabetologia 35:595-601.

Hales, C. N., and D. J. Barker. 2001. The thrifty phenotype hypothesis. Br Med Bull 60:5-20. Hara, K., O. Terasaki, and Y. Okubo. 2000. Dipole estimation of alpha EEG during alcohol

ingestion in males genotypes for ALDH2. Life Sci 67:1163-1173. Harada, S., D. P. Agarwal, and H. W. Goedde. 1982. Mechanism of alcohol sensitivity and

disulfiram-ethanol reaction. Subst Alcohol Actions Misuse 3:107-115. Harpending, H., and A. Rogers. 2000. Genetic perspectives on human origins and differentiation.

Annu Rev Genomics Hum Genet 1:361-385. Harpending, H., S. T. Sherry, A. R. Rogers, and M. Stoneking. 1993. The genetic structure of

ancient human populations. Curr Anthropol 34:483-496. Harpending, H. C. 1994. Signature of ancient population growth in a low-resolution

mitochondrial DNA mismatch distribution. Hum Biol 66:591-600. Harpending, H. C., M. A. Batzer, M. Gurven, L. B. Jorde, A. R. Rogers, and S. T. Sherry. 1998.

Genetic traces of ancient demography. Proc Natl Acad Sci U S A 95:1961-1967. Harper, K. N., P. S. Ocampo, B. M. Steiner, R. W. George, M. S. Silverman, S. Bolotin, A.

Pillay, N. J. Saunders, and G. J. Armelagos. 2008. On the origin of the treponematoses: a phylogenetic approach. PLoS Negl Trop Dis 2:e148.

Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160-174.

Helzer, J. E., K. K. Bucholz, L. J. Bierut, D. A. Regier, M. A. Schuckit, and S. E. Guth. 2006. Should DSM-V include dimensional diagnostic criteria for alcohol use disorders? Alcohol Clin Exp Res 30:303-310.

Henderson, G. J., N. G. Hoffman, L. H. Ping, S. A. Fiscus, I. F. Hoffman, K. M. Kitrinos, T. Banda, F. E. Martinson, P. N. Kazembe, D. A. Chilongozi, M. S. Cohen, and R. Swanstrom. 2004. HIV-1 populations in blood and breast milk are similar. Virology 330:295-303.

Ho, F. C., R. L. Wong, and J. W. Lawton. 1979. Human colostral and breast milk cells. A light and electron microscopic study. Acta Paediatr Scand 68:389-396.

Holmes, E. C., R. Urwin, and M. C. Maiden. 1999. The influence of recombination on the population structure and evolution of the human pathogen Neisseria meningitidis. Mol Biol Evol 16:741-749.

Hoover, E. L. 2007. There is no scientific rationale for race-based research. J Natl Med Assoc 99:690-692.

Howell-Adams, B., and H. S. Seifert. 2000. Molecular models accounting for the gene conversion reactions mediating gonococcal pilin antigenic variation. Mol Microbiol 37:1146-1158.

Hudson, E. H. 1965. Treponematosis and man's social evolution. American Anthropologist 67:885-901.

Hudson, R. R., M. Slatkin, and W. P. Maddison. 1992. Estimation of levels of gene flow from DNA sequence data. Genetics 132:583-589.

Hughes, D. 2000. Co-evolution of the tuf genes links gene conversion with the generation of chromosomal inversions. J Mol Biol 297:355-364.

Hughes, E. S., J. E. Bell, and P. Simmonds. 1997a. Investigation of population diversity of human immunodeficiency virus type 1 in vivo by nucleotide sequencing and length polymorphism analysis of the V1/V2 hypervariable region of env. J Gen Virol 78 ( Pt 11):2871-2882.

Hughes, E. S., J. E. Bell, and P. Simmonds. 1997b. Investigation of the dynamics of the spread of human immunodeficiency virus to brain and other tissues by evolutionary analysis of sequences from the p17gag and env genes. J Virol 71:1272-1280.

Hugot, J. P., M. Chamaillard, H. Zouali, S. Lesage, J. P. Cezard, J. Belaiche, S. Almer, C. Tysk, C. A. O'Morain, M. Gassull, V. Binder, Y. Finkel, A. Cortot, R. Modigliani, P. Laurent-Puig, C. Gower-Rousseau, J. Macry, J. F. Colombel, M. Sahbatou, and G. Thomas. 2001. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 411:599-603.

Hung, C. S., N. Vander Heyden, and L. Ratner. 1999. Analysis of the critical domain in the V3 loop of human immunodeficiency virus type 1 gp120 involved in CCR5 utilization. J Virol 73:8216-8226.

Ichikawa, M., M. Sugita, M. Takahashi, M. Satomi, T. Takeshita, T. Araki, and H. Takahashi. 2003. Breast milk macrophages spontaneously produce granulocyte-macrophage colony-stimulating factor and differentiate into dendritic cells in the presence of exogenous interleukin-4 alone. Immunology 108:189-195.

IHS. 2006. Facts on Indian Health Disparities. Indian Health Services. Iliff, P. J., E. G. Piwoz, N. V. Tavengwa, C. D. Zunguza, E. T. Marinda, K. J. Nathoo, L. H.

Moulton, B. J. Ward, and J. H. Humphrey. 2005. Early exclusive breastfeeding reduces the risk of postnatal HIV-1 transmission and increases HIV-free survival. Aids 19:699-708.

Iwahashi, K., Y. Matsuo, H. Suwaki, K. Nakamura, and Y. Ichikawa. 1995. CYP2E1 and ALDH2 genotypes and alcohol dependence in Japanese. Alcohol Clin Exp Res 19:564-566.

John, G. C., R. W. Nduati, D. A. Mbori-Ngacha, B. A. Richardson, D. Panteleeff, A. Mwatha, J. Overbaugh, J. Bwayo, J. O. Ndinya-Achola, and J. K. Kreiss. 2001. Correlates of mother-to-child human immunodeficiency virus type 1 (HIV-1) transmission: association with maternal plasma HIV-1 RNA load, genital HIV-1 DNA shedding, and breast infections. J Infect Dis 183:206-212.

Johnson, J. E., and C. W. McNutt. 1964. Diabetes mellitus in an American Indian population isolate. Tex Rep Biol Med 22:110-125.

Jones, K. E., N. G. Patel, M. A. Levy, A. Storeygard, D. Balk, J. L. Gittleman, and P. Daszak. 2008. Global trends in emerging infectious diseases. Nature 451:990-993.

Kemal, K. S., B. Foley, H. Burger, K. Anastos, H. Minkoff, C. Kitchen, S. M. Philpott, W. Gao, E. Robison, S. Holman, C. Dehner, S. Beck, W. A. Meyer, 3rd, A. Landay, A. Kovacs, J. Bremer, and B. Weiser. 2003. HIV-1 in genital tract and plasma of women: compartmentalization of viral sequences, coreceptor usage, and glycosylation. Proc Natl Acad Sci U S A 100:12972-12977.

Kim, D.-J., I.-G. Choi, B. L. Park, B.-C. Lee, B.-J. Ham, S. Yoon, J. S. Bae, H. S. Cheong, and H. D. Shin. 2008. Major genetic components underlying alcoholism in Korean population. Hum Mol Genet 17:854-858.

Kimura, M., and J. L. King. 1979. Fixation of a deleterious allele at one of two "duplicate" loci by mutation pressure and random drift. Proc Natl Acad Sci U S A 76:2858-2861.

King, R. C., and W. D. Stansfield. 1997. A Dictionary of Genetics. Oxford University Press, New York.

Kinzie, J. D., P. K. Leung, J. Boehnlein, D. Matsunaga, R. Johnson, S. Manson, J. H. Shore, J. Heinz, and M. Williams. 1992. Psychiatric epidemiology of an Indian village. A 19-year replication study. J Nerv Ment Dis 180:33-39.

Kitrinos, K. M., N. G. Hoffman, J. A. Nelson, and R. Swanstrom. 2003. Turnover of env variable region 1 and 2 genotypes in subjects with late-stage human immunodeficiency virus type 1 infection. J Virol 77:6811-6822.

Klevytska, A. M., M. R. Mracna, L. Guay, G. Becker-Pergola, M. Furtado, L. Zhang, J. B. Jackson, and S. H. Eshleman. 2002. Analysis of length variation in the V1-V2 region of env in nonsubtype B HIV type 1 from Uganda. AIDS Res Hum Retroviruses 18:791-796.

Knowler, W. C., D. J. Pettitt, M. F. Saad, and P. H. Bennett. 1990. Diabetes mellitus in the Pima Indians: incidence, risk factors and pathogenesis. Diabetes Metab Rev 6:1-27.

Kobayashi, H., S. Ide, J. Hasegawa, H. Ujike, Y. Sekine, N. Ozaki, T. Inada, M. Harano, T. Komiyama, M. Yamada, M. Iyo, H. W. Shen, K. Ikeda, and I. Sora. 2004. Study of association between alpha-synuclein gene polymorphism and methamphetamine psychosis/dependence. Ann N Y Acad Sci 1025:325-334.

Kolman, C. J., and E. Bermingham. 1997. Mitochondrial and nuclear DNA diversity in the Choco and Chibcha Amerinds of Panama. Genetics 147:1289-1302.

Kolman, C. J., E. Bermingham, R. Cooke, R. H. Ward, T. D. Arias, and F. Guionneau-Sinclair. 1995. Reduced mtDNA diversity in the Ngobe Amerinds of Panama. Genetics 140:275-283.

Kolman, C. J., N. Sambuughin, and E. Bermingham. 1996. Mitochondrial DNA analysis of Mongolian populations and implications for the origin of New World founders. Genetics 142:1321-1334.

Kondrashov, F. A., I. B. Rogozin, Y. I. Wolf, and E. V. Koonin. 2002. Selection in the evolution of gene duplications. Genome Biol 3:RESEARCH0008.

Konishi, T., M. Calvillo, A. S. Leng, J. Feng, T. Lee, H. Lee, J. L. Smith, S. H. Sial, N. Berman, S. French, V. Eysselein, K. M. Lin, and Y. J. Wan. 2003. The ADH3*2 and CYP2E1 c2 alleles increase the risk of alcoholism in Mexican American men. Exp Mol Pathol 74:183-189.

Korber, B. T., K. J. Kunstman, B. K. Patterson, M. Furtado, M. M. McEvilly, R. Levy, and S. M. Wolinsky. 1994. Genetic differences between blood- and brain-derived viral sequences from human immunodeficiency virus type 1-infected patients: evidence of conserved elements in the V3 region of the envelope protein of brain-derived sequences. J Virol 68:7467-7481.

Koulinska, I. N., E. Villamor, B. Chaplin, G. Msamanga, W. Fawzi, B. Renjifo, and M. Essex. 2006. Transmission of cell-free and cell-associated HIV-1 through breast-feeding. J Acquir Immune Defic Syndr 41:93-99.

Kourtis, A. P., S. Butera, C. Ibegbu, L. Beled, and A. Duerr. 2003. Breast milk and HIV-1: vector of transmission or vehicle of protection? Lancet Infect Dis 3:786-793.

Krause, J., C. Lalueza-Fox, L. Orlando, W. Enard, R. E. Green, H. A. Burbano, J. J. Hublin, C. Hanni, J. Fortea, M. de la Rasilla, J. Bertranpetit, A. Rosas, and S. Paabo. 2007. The derived FOXP2 variant of modern humans was shared with Neandertals. Curr Biol 17:1908-1912.

Krings, M., C. Capelli, F. Tschentscher, H. Geisert, S. Meyer, A. von Haeseler, K. Grossschmidt, G. Possnert, M. Paunovic, and S. Paabo. 2000. A view of Neandertal genetic diversity. Nat Genet 26:144-146.

Krings, M., H. Geisert, R. W. Schmitz, H. Krainitzki, and S. Paabo. 1999. DNA sequence of the mitochondrial hypervariable region II from the neandertal type specimen. Proc Natl Acad Sci U S A 96:5581-5585.

Krings, M., A. Stone, R. W. Schmitz, H. Krainitzki, M. Stoneking, and S. Paabo. 1997. Neandertal DNA sequences and the origin of modern humans. Cell 90:19-30.

Kruger, R., W. Kuhn, T. Muller, D. Woitalla, M. Graeber, S. Kosel, H. Przuntek, J. T. Epplen, L. Schols, and O. Riess. 1998. Ala30Pro mutation in the gene encoding alpha-synuclein in Parkinson's disease. Nat Genet 18:106-108.

Kuhn, L., M. Sinkala, C. Kankasa, K. Semrau, P. Kasonde, N. Scott, M. Mwiya, C. Vwalika, J. Walter, W. Y. Tsai, G. M. Aldrovandi, and D. M. Thea. 2007. High Uptake of Exclusive Breastfeeding and Reduced Early Post-Natal HIV Transmission. PLoS ONE 2:e1363.

Kunitz, S. J., K. R. Gabriel, J. E. Levy, E. Henderson, K. Lampert, J. McCloskey, G. Quintero, S. Russell, and A. Vince. 1999. Alcohol dependence and conduct disorder among Navajo Indians. J Stud Alcohol 60:159-167.

LaFond, R. E., A. Centurion-Lara, C. Godornes, A. M. Rompalo, W. C. Van Voorhis, and S. A. Lukehart. 2003. Sequence diversity of Treponema pallidum subsp. pallidum tprK in human syphilis lesions and rabbit-propagated isolates. J Bacteriol 185:6262-6268.

Lalueza-Fox, C., M. L. Sampietro, D. Caramelli, Y. Puder, M. Lari, F. Calafell, C. Martinez-Maza, M. Bastir, J. Fortea, M. de la Rasilla, J. Bertranpetit, and A. Rosas. 2005. Neandertal evolutionary genetics: mitochondrial DNA data from the iberian peninsula. Mol Biol Evol 22:1077-1081.

Lamers, S. L., J. W. Sleasman, J. X. She, K. A. Barrie, S. M. Pomeroy, D. J. Barrett, and M. M. Goodenow. 1993. Independent variation and positive selection in env V1 and V2 domains within maternal-infant strains of human immunodeficiency virus type 1 in vivo. J Virol 67:3951-3960.

Langford, D., A. Grigorian, R. Hurford, A. Adame, R. J. Ellis, L. Hansen, and E. Masliah. 2004. Altered P-glycoprotein expression in AIDS patients with HIV encephalitis. J Neuropathol Exp Neurol 63:1038-1047.

Lathe, W. C., 3rd, and P. Bork. 2001. Evolution of tuf genes: ancient duplication, differential loss and gene conversion. FEBS Lett 502:113-116.

Leitner, T., B. Korber, M. Daniels, C. Calef, and B. Foley. 2005. HIV-1 Subtype and Circulating Recombinant Form (CRF) Reference Sequences, 2005. The Human Retroviruses and AIDS 2005 Compendium. Theoretical Biology and Biophysics, Los Alamos.

Leroy, V., J. M. Karon, A. Alioum, E. R. Ekpini, P. van de Perre, A. E. Greenberg, P. Msellati, M. Hudgens, F. Dabis, and S. Z. Wiktor. 2003. Postnatal transmission of HIV-1 after a maternal short-course zidovudine peripartum regimen in West Africa. Aids 17:1493-1501.

Lewis, P., R. Nduati, J. K. Kreiss, G. C. John, B. A. Richardson, D. Mbori-Ngacha, J. Ndinya-Achola, and J. Overbaugh. 1998. Cell-free human immunodeficiency virus type 1 in breast milk. J Infect Dis 177:34-39.

Liang, T., J. Spence, L. Liu, W. N. Strother, H. W. Chang, J. A. Ellison, L. Lumeng, T. K. Li, T. Foroud, and L. G. Carr. 2003. alpha-Synuclein maps to a quantitative trait locus for

alcohol preference and is differentially expressed in alcohol-preferring and -nonpreferring rats. Proc Natl Acad Sci U S A 100:4690-4695.

Liao, D. 2000. Gene conversion drives within genic sequences: concerted evolution of ribosomal RNA genes in bacteria and archaea. J Mol Evol 51:305-317.

Libert, F., P. Cochaux, G. Beckman, M. Samson, M. Aksenova, A. Cao, A. Czeizel, M. Claustres, C. de la Rua, M. Ferrari, C. Ferrec, G. Glover, B. Grinde, S. Guran, V. Kucinskas, J. Lavinha, B. Mercier, G. Ogur, L. Peltonen, C. Rosatelli, M. Schwartz, V. Spitsyn, L. Timar, L. Beckman, M. Parmentier, and G. Vassart. 1998. The deltaccr5 mutation conferring protection against HIV-1 in Caucasian populations has a single and recent origin in Northeastern Europe. Hum Mol Genet 7:399-406.

Lindsay, R. S., and P. H. Bennett. 2001. Type 2 diabetes, the thrifty phenotype - an overview. Br Med Bull 60:21-32.

Long, J. C., W. C. Knowler, R. L. Hanson, R. W. Robin, M. Urbanek, E. Moore, P. H. Bennett, and D. Goldman. 1998. Evidence for genetic linkage to alcohol dependence on chromosomes 4 and 11 from an autosome-wide scan in an American Indian population. Am J Med Genet 81:216-221.

Loussert-Ajaka, I., M. L. Chaix, B. Korber, F. Letourneur, E. Gomas, E. Allen, T. D. Ly, F. Brun-Vezinet, F. Simon, and S. Saragosti. 1995. Variability of human immunodeficiency virus type 1 group O strains isolated from Cameroonian patients living in France. J Virol 69:5640-5649.

Lucotte, G. 2001. Distribution of the CCR5 gene 32-basepair deletion in West Europe. A hypothesis about the possible dispersion of the mutation by the Vikings in historical times. Hum Immunol 62:933-936.

Lukehart, S. A., S. A. Baker-Zander, R. M. Lloyd, and S. Sell. 1980. Characterization of lymphocyte responsiveness in early experimental syphilis. II. Nature of cellular infiltration and Treponema pallidum distribution in testicular lesions. J Immunol 124:461-467.

Luo, X., H. R. Kranzler, L. Zuo, S. Wang, N. J. Schork, and J. Gelernter. 2007. Multiple ADH genes modulate risk for drug dependence in both African- and European-Americans. Hum Mol Genet 16:380-390.

Lynch, M., and J. S. Conery. 2000. The evolutionary fate and consequences of duplicate genes. Science 290:1151-1155.

Lynch, M., and A. Force. 2000. The probability of duplicate gene preservation by subfunctionalization. Genetics 154:459-473.

Macaulay, V., C. Hill, A. Achilli, C. Rengo, D. Clarke, W. Meehan, J. Blackburn, O. Semino, R. Scozzari, F. Cruciani, A. Taha, N. K. Shaari, J. M. Raja, P. Ismail, Z. Zainuddin, W. Goodwin, D. Bulbeck, H. J. Bandelt, S. Oppenheimer, A. Torroni, and M. Richards. 2005. Single, rapid coastal settlement of Asia revealed by analysis of complete mitochondrial genomes. Science 308:1034-1036.

Maddison, W. P., and D. R. Maddison. 1989. Interactive analysis of phylogeny and character evolution using the computer program MacClade. Folia Primatol (Basel) 53:190-202.

Maddon, P. J., A. G. Dalgleish, J. S. McDougal, P. R. Clapham, R. A. Weiss, and R. Axel. 1986. The T4 gene encodes the AIDS virus receptor and is expressed in the immune system and the brain. Cell 47:333-348.

Manning, L. S., and M. J. Parmely. 1980. Cellular determinants of mammary cell-mediated immunity in the rat. I. The migration of radioisotopically labeled T lymphocytes. J Immunol 125:2508-2514.

Martin, D. P., C. Williamson, and D. Posada. 2005. RDP2: recombination detection and analysis from sequence alignments. Bioinformatics 21:260-262.

Mash, D. C., Q. Ouyang, J. Pablo, M. Basile, S. Izenwasser, A. Lieberman, and R. J. Perrin. 2003. Cocaine abusers have an overexpression of alpha-synuclein in dopamine neurons. J Neurosci 23:2564-2571.

Mathers, C. D., and D. Loncar. 2006. Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med 3:e442.

McCarthy, D. J., and P. Zimmet. 2001. Pacific Island Populations. Pp. 239-245 in J. M. Ekoe, P. Zimmet, and R. WIlliams, eds. The epidemiology of diabetes mellitus. An international perspective. Wiley, Chichester.

McCarthy, D. M., T. L. Wall, S. A. Brown, and L. G. Carr. 2000. Integrating biological and behavioral factors in alcohol use risk: the role of ALDH2 status and alcohol expectancies in a sample of Asian Americans. Exp Clin Psychopharmacol 8:168-175.

Meltzer, D. J. 1993. Pleistocene peopling of the Americas. Evol Anthropol 1:157-169. Merriwether, D. A., F. Rothhammer, and R. E. Ferrell. 1995. Distribution of the four founding

lineage haplotypes in Native Americans suggests a single wave of migration for the New World. Am J Phys Anthropol 98:411-430.

Monsalve, M. V., A. Helgason, and D. V. Devine. 1999. Languages, geography and HLA haplotypes in native American and Asian populations. Proc Biol Sci 266:2209-2216.

Moore, C. B., M. John, I. R. James, F. T. Christiansen, C. S. Witt, and S. A. Mallal. 2002. Evidence of HIV-1 adaptation to HLA-restricted immune responses at a population level. Science 296:1439-1443.

Morens, D. M., G. K. Folkers, and A. S. Fauci. 2004. The challenge of emerging and re-emerging infectious diseases. Nature 430:242-249.

Morris, A., M. Marsden, K. Halcrow, E. S. Hughes, R. P. Brettle, J. E. Bell, and P. Simmonds. 1999. Mosaic structure of the human immunodeficiency virus type 1 genome infecting lymphoid cells and the brain: evidence for frequent in vivo recombination events in the evolution of regional populations. J Virol 73:8720-8731.

Morse, S. S. 1995. Factors in the emergence of infectious diseases. Emerg Infect Dis 1:7-15. Mulligan, C. J., K. Hunley, S. Cole, and J. C. Long. 2004. Population genetics, history, and

health patterns in native americans. Annu Rev Genomics Hum Genet 5:295-315. Mulligan, C. J., S. J. Norris, and S. A. Lukehart. 2008. Molecular Studies in Treponema pallidum

Evolution: Towards Clarity? PLoS Neg Trop Dis 2:e184. Mulligan, C. J., R. W. Robin, M. V. Osier, N. Sambuughin, L. G. Goldfarb, R. A. Kittles, D.

Hesselbrock, D. Goldman, and J. C. Long. 2003. Allelic variation at alcohol metabolism genes ( ADH1B, ADH1C, ALDH2) and alcohol dependence in an American Indian population. Hum Genet 113:325-336.

Nabatov, A. A., G. Pollakis, T. Linnemann, A. Kliphius, M. I. Chalaby, and W. A. Paxton. 2004. Intrapatient alterations in the human immunodeficiency virus type 1 gp120 V1V2 and V3 regions differentially modulate coreceptor usage, virus inhibition by CC/CXC chemokines, soluble CD4, and the b12 and 2G12 monoclonal antibodies. J Virol 78:524-530.

Nakamura, K., K. Iwahashi, Y. Matsuo, R. Miyatake, Y. Ichikawa, and H. Suwaki. 1996. Characteristics of Japanese alcoholics with the atypical aldehyde dehydrogenase 2*2. I. A comparison of the genotypes of ALDH2, ADH2, ADH3, and cytochrome P-4502E1 between alcoholics and nonalcoholics. Alcohol Clin Exp Res 20:52-55.

Nduati, R., G. John, D. Mbori-Ngacha, B. Richardson, J. Overbaugh, A. Mwatha, J. Ndinya-Achola, J. Bwayo, F. E. Onyango, J. Hughes, and J. Kreiss. 2000. Effect of Breastfeeding and Formula Feeding on Transmission of HIV-1: A Randomized Clinical Trial. Pp. 1167-1174.

Neel, J. V. 1962. Diabetes mellitus: a "thrifty" genotype rendered detrimental by "progress"? Am J Hum Genet 14:353-362.

Neel, J. V. 1999. The "thrifty genotype" in 1998. Nutr Rev 57:S2-9. Neel, J. V. 1982. The thrifty genotype revisited. in J. Kobberling, and J. Tattersall, eds. The

genetics of diabetes mellitus. Academic Press, New York. Neumark, Y. D., Y. Friedlander, H. R. Thomasson, and T. K. Li. 1998. Association of the

ADH2*2 allele with reduced ethanol consumption in Jewish men in Israel: a pilot study. J Stud Alcohol 59:133-139.

Neville, M. C., J. C. Allen, P. C. Archer, C. E. Casey, J. Seacat, R. P. Keller, V. Lutes, J. Rasbach, and M. Neifert. 1991. Studies in human lactation: milk volume and nutrient composition during weaning and lactogenesis. Am J Clin Nutr 54:81-92.

Neville, M. C., and M. R. Neifert. 1983. Lactation: Physiology, Nutrition, and Breastfeeding. Plenum Press, New York.

Nickle, D. C., M. A. Jensen, D. Shriner, S. J. Brodie, L. M. Frenkel, J. E. Mittler, and J. I. Mullins. 2003. Evolutionary indicators of human immunodeficiency virus type 1 reservoirs and compartments. J Virol 77:5540-5546.

Nielsen, R., I. Hellmann, M. Hubisz, C. Bustamante, and A. G. Clark. 2007. Recent and ongoing selection in the human genome. Nat Rev Genet 8:857-868.

Noonan, J. P., J. Grimwood, J. Schmutz, M. Dickson, and R. M. Myers. 2004. Gene conversion and the evolution of protocadherin gene cluster diversity. Genome Res 14:354-366.

Novoradovsky, A. G., J. Kidd, K. Kidd, and D. Goldman. 1995. Apparent monomorphism of ALDH2 in seven American Indian populations. Alcohol 12:163-167.

O'Brien, S. J., and G. W. Nelson. 2004. Human genes that limit AIDS. Nat Genet 36:565-574. Ochman, H., J. G. Lawrence, and E. A. Groisman. 2000. Lateral gene transfer and the nature of

bacterial innovation. Nature 405:299-304. Ohagen, A., A. Devitt, K. J. Kunstman, P. R. Gorry, P. P. Rose, B. Korber, J. Taylor, R. Levy, R.

L. Murphy, S. M. Wolinsky, and D. Gabuzda. 2003. Genetic and functional analysis of full-length human immunodeficiency virus type 1 env genes derived from brain and blood of patients with AIDS. J Virol 77:12336-12345.

Ohta, T. 1992. The Nearly Neutral Theory of Molecular Evolution. Annual Review of Ecology and Systematics 23:263-286.

Omran, A. R. 1971. The epidemiologic transition. A theory of the epidemiology of population change. Milbank Mem Fund Q 49:509-538.

Ordovas, J., A. Pittas, and A. S. Greenberg. 2003. Might the diabetic environment in utero lead to type 2 diabetes? Lancet 361:1839-1840.

Osier, M., A. J. Pakstis, J. R. Kidd, J. F. Lee, S. J. Yin, H. C. Ko, H. J. Edenberg, R. B. Lu, and K. K. Kidd. 1999. Linkage disequilibrium at the ADH2 and ADH3 loci and risk of alcoholism. Am J Hum Genet 64:1147-1157.

Osier, M. V., A. J. Pakstis, D. Goldman, H. J. Edenberg, J. R. Kidd, and K. K. Kidd. 2002a. A proline-threonine substitution in codon 351 of ADH1C is common in Native Americans. Alcohol Clin Exp Res 26:1759-1763.

Osier, M. V., A. J. Pakstis, H. Soodyall, D. Comas, D. Goldman, A. Odunsi, F. Okonofua, J. Parnas, L. O. Schulz, J. Bertranpetit, B. Bonne-Tamir, R. B. Lu, J. R. Kidd, and K. K. Kidd. 2002b. A global perspective on genetic variation at the ADH genes reveals unusual patterns of linkage disequilibrium and diversity. Am J Hum Genet 71:84-99.

Padidam, M., S. Sawyer, and C. M. Fauquet. 1999. Possible emergence of new geminiviruses by frequent recombination. Virology 265:218-225.

Paradies, Y. C., M. J. Montoya, and S. M. Fullerton. 2007. Racialized genetics and the study of complex diseases: the thrifty genotype revisited. Perspect Biol Med 50:203-227.

Pastore, C., R. Nedellec, A. Ramos, S. Pontow, L. Ratner, and D. E. Mosier. 2006. Human immunodeficiency virus type 1 coreceptor switching: V1/V2 gain-of-fitness mutations compensate for V3 loss-of-fitness mutations. J Virol 80:750-758.

Perrin, L., L. Kaiser, and S. Yerly. 2003. Travel and the spread of HIV-1 genetic variants. Lancet Infect Dis 3:22-27.

Peterson, L. E., J. S. Barnholtz, G. P. Page, T. M. King, M. de Andrade, and C. I. Amos. 1999. A genome-wide search for susceptibility genes linked to alcohol dependence. Genet Epidemiol 17 Suppl 1:S295-300.

Petitjean, G., P. Becquart, E. Tuaillon, Y. Al Tabaa, D. Valea, M. F. Huguet, N. Meda, P. Van de Perre, and J. P. Vendrell. 2007. Isolation and characterization of HIV-1-infected resting CD4+ T lymphocytes in breast milk. J Clin Virol 39:1-8.

Petito, C. K. 2004. Human immunodeficiency virus type 1 compartmentalization in the central nervous system. J Neurovirol 10 Suppl 1:21-24.

Philpott, S., H. Burger, C. Tsoukas, B. Foley, K. Anastos, C. Kitchen, and B. Weiser. 2005. Human immunodeficiency virus type 1 genomic RNA sequences in the female genital tract and blood: compartmentalization and intrapatient recombination. J Virol 79:353-363.

Pilcher, C. D., D. C. Shugars, S. A. Fiscus, W. C. Miller, P. Menezes, J. Giner, B. Dean, K. Robertson, C. E. Hart, J. L. Lennox, J. J. Eron, Jr., and C. B. Hicks. 2001. HIV in body fluids during primary HIV infection: implications for pathogenesis, treatment and public health. Aids 15:837-845.

Pillai, S. K., B. Good, S. K. Pond, J. K. Wong, M. C. Strain, D. D. Richman, and D. M. Smith. 2005. Semen-specific genetic characteristics of human immunodeficiency virus type 1 env. J Virol 79:1734-1742.

Pillai, S. K., S. L. Pond, Y. Liu, B. M. Good, M. C. Strain, R. J. Ellis, S. Letendre, D. M. Smith, H. F. Gunthard, I. Grant, T. D. Marcotte, J. A. McCutchan, D. D. Richman, and J. K. Wong. 2006. Genetic attributes of cerebrospinal fluid-derived HIV-1 env. Brain 129:1872-1883.

Pillay, K., A. Coutsoudis, D. York, L. Kuhn, and H. M. Coovadia. 2000. Cell-free virus in breast milk of HIV-1-seropositive women. J Acquir Immune Defic Syndr 24:330-336.

Ping, L. H., M. S. Cohen, I. Hoffman, P. Vernazza, F. Seillier-Moiseiwitsch, H. Chakraborty, P. Kazembe, D. Zimba, M. Maida, S. A. Fiscus, J. J. Eron, R. Swanstrom, and J. A. Nelson. 2000. Effects of genital tract inflammation on human immunodeficiency virus type 1 V3 populations in blood and semen. J Virol 74:8946-8952.

Pitt, J. 1979. The milk mononuclear phagocyte. Pediatrics 64:745-749.

Plagnol, V., and J. D. Wall. 2006. Possible ancestral structure in human populations. PLoS Genet 2:e105.

Pollard, A. J., and S. R. Dobson. 2000. Emerging infectious diseases in the 21st century. Curr Opin Infect Dis 13:265-275.

Polymeropoulos, M. H., C. Lavedan, E. Leroy, S. E. Ide, A. Dehejia, A. Dutra, B. Pike, H. Root, J. Rubenstein, R. Boyer, E. S. Stenroos, S. Chandrasekharappa, A. Athanassiadou, T. Papapetropoulos, W. G. Johnson, A. M. Lazzarini, R. C. Duvoisin, G. Di Iorio, L. I. Golbe, and R. L. Nussbaum. 1997. Mutation in the alpha-synuclein gene identified in families with Parkinson's disease. Science 276:2045-2047.

Pond, S. L., S. D. Frost, and S. V. Muse. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21:676-679.

Posada, D., and K. A. Crandall. 2001. Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci U S A 98:13757-13762.

Posada, D., and K. A. Crandall. 1998. MODELTEST: testing the model of DNA substitution. Bioinformatics 14:817-818.

Posada, D., K. A. Crandall, and E. C. Holmes. 2002. Recombination in evolutionary genomics. Annu Rev Genet 36:75-97.

Poss, M., A. G. Rodrigo, J. J. Gosink, G. H. Learn, D. de Vange Panteleeff, H. L. Martin, Jr., J. Bwayo, J. K. Kreiss, and J. Overbaugh. 1998. Evolution of envelope sequences from the genital tract and peripheral blood of women infected with clade A human immunodeficiency virus type 1. J Virol 72:8240-8251.

Powell, M. L., and D. C. Cook. 2005. Treponematosis: Inquires into the Nature of Protean Disease in M. L. Powell, and D. C. Cook, eds. The Myth of Syphilis: the Natural History of Treponematosis in North America. University of Florida Press, Gainesville

Prugnolle, F., A. Manica, M. Charpentier, J. F. Guegan, V. Guernier, and F. Balloux. 2005. Pathogen-driven selection and worldwide HLA class I diversity. Curr Biol 15:1022-1027.

Quintana-Murci, L., A. Alcais, L. Abel, and J. L. Casanova. 2007. Immunology in natura: clinical, epidemiological and evolutionary genetics of infectious diseases. Nat Immunol 8:1165-1171.

Quintana-Murci, L., O. Semino, H. J. Bandelt, G. Passarino, K. McElreavey, and A. S. Santachiara-Benerecetti. 1999. Genetic evidence of an early exit of Homo sapiens sapiens from Africa through eastern Africa. Nat Genet 23:437-441.

Ramachandran, S., O. Deshpande, C. C. Roseman, N. A. Rosenberg, M. W. Feldman, and L. L. Cavalli-Sforza. 2005. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci U S A 102:15942-15947.

Reich, T., H. J. Edenberg, A. Goate, J. T. Williams, J. P. Rice, P. Van Eerdewegh, T. Foroud, V. Hesselbrock, M. A. Schuckit, K. Bucholz, B. Porjesz, T. K. Li, P. M. Conneally, J. I. Nurnberger, Jr., J. A. Tischfield, R. R. Crowe, C. R. Cloninger, W. Wu, S. Shears, K. Carr, C. Crose, C. Willig, and H. Begleiter. 1998. Genome-wide search for genes affecting the risk for alcohol dependence. Am J Med Genet 81:207-215.

Relethford, J. H. 2001. Absence of regional affinities of Neandertal DNA with living humans does not reject multiregional evolution. Am J Phys Anthropol 115:95-98.

Richardson, B. A., G. C. John-Stewart, J. P. Hughes, R. Nduati, D. Mbori-Ngacha, J. Overbaugh, and J. K. Kreiss. 2003. Breast-milk infectivity in human immunodeficiency virus type 1-infected mothers. J Infect Dis 187:736-740.

Ritola, K., C. D. Pilcher, S. A. Fiscus, N. G. Hoffman, J. A. Nelson, K. M. Kitrinos, C. B. Hicks, J. J. Eron, Jr., and R. Swanstrom. 2004. Multiple V1/V2 env variants are frequently present during primary infection with human immunodeficiency virus type 1. J Virol 78:11208-11218.

Ritola, K., K. Robertson, S. A. Fiscus, C. Hall, and R. Swanstrom. 2005. Increased human immunodeficiency virus type 1 (HIV-1) env compartmentalization in the presence of HIV-1-associated dementia. J Virol 79:10830-10834.

Rivas, R. A., A. A. el-Mohandes, and I. M. Katona. 1994. Mononuclear phagocytic cells in human milk: HLA-DR and Fc gamma R ligand expression. Biol Neonate 66:195-204.

Robin, R. W., J. C. Long, J. K. Rasmussen, B. Albaugh, and D. Goldman. 1998. Relationship of binge drinking to alcohol dependence, other psychiatric disorders, and behavioral problems in an American Indian tribe. Alcohol Clin Exp Res 22:518-523.

Rollins, N. C., S. M. Filteau, A. Coutsoudis, and A. M. Tomkins. 2001. Feeding mode, intestinal permeability, and neopterin excretion: a longitudinal study in infants of HIV-infected South African women. J Acquir Immune Defic Syndr 28:132-139.

Rothschild, B. 2003. Infectious processes around the dawn of civilization in C. L. Greenblatt, and M. Spigelman, eds. Emerging Pathogens: The Archaeology, Ecology, and Evolution of Infectious Disease. Oxford University Press, London.

Rousseau, C. M., R. W. Nduati, B. A. Richardson, G. C. John-Stewart, D. A. Mbori-Ngacha, J. K. Kreiss, and J. Overbaugh. 2004. Association of levels of HIV-1-infected breast milk cells and risk of mother-to-child transmission. J Infect Dis 190:1880-1888.

Rousseau, C. M., R. W. Nduati, B. A. Richardson, M. S. Steele, G. C. John-Stewart, D. A. Mbori-Ngacha, J. K. Kreiss, and J. Overbaugh. 2003. Longitudinal analysis of human immunodeficiency virus type 1 RNA in breast milk and of its relationship to infant infection and maternal disease. J Infect Dis 187:741-747.

Royce, R. A., A. Sena, W. Cates, Jr., and M. S. Cohen. 1997. Sexual transmission of HIV. N Engl J Med 336:1072-1078.

Rozas, J., J. C. Sanchez-DelBarrio, X. Messeguer, and R. Rozas. 2003. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:2496-2497.

Saag, M. S., B. H. Hahn, J. Gibbons, Y. Li, E. S. Parks, W. P. Parks, and G. M. Shaw. 1988. Extensive variation of human immunodeficiency virus type-1 in vivo. Nature 334:440-444.

Saccone, N. L., J. M. Kwon, J. Corbett, A. Goate, N. Rochberg, H. J. Edenberg, T. Foroud, T. K. Li, H. Begleiter, T. Reich, and J. P. Rice. 2000. A genome screen of maximum number of drinks as an alcoholism phenotype. Am J Med Genet 96:632-637.

Sagar, M., X. Wu, S. Lee, and J. Overbaugh. 2006. Human immunodeficiency virus type 1 V1-V2 envelope loop sequences expand and add glycosylation sites over the course of infection, and these modifications affect antibody neutralization sensitivity. J Virol 80:9586-9598.

Salemi, M., B. R. Burkhardt, R. R. Gray, G. Ghaffari, J. W. Sleasman, and M. M. Goodenow. 2007. Phylodynamics of HIV-1 in Lymphoid and Non-Lymphoid Tissues Reveals a Central Role for the Thymus in Emergence of CXCR4-Using Quasispecies. PLoS ONE 2:e950.

Salemi, M., S. L. Lamers, S. Yu, T. de Oliveira, W. M. Fitch, and M. S. McGrath. 2005. Phylodynamic analysis of human immunodeficiency virus type 1 in distinct brain

compartments provides a model for the neuropathogenesis of AIDS. J Virol 79:11343-11352.

Samson, M., F. Libert, B. J. Doranz, J. Rucker, C. Liesnard, C. M. Farber, S. Saragosti, C. Lapoumeroulie, J. Cognaux, C. Forceille, G. Muyldermans, C. Verhofstede, G. Burtonboy, M. Georges, T. Imai, S. Rana, Y. Yi, R. J. Smyth, R. G. Collman, R. W. Doms, G. Vassart, and M. Parmentier. 1996. Resistance to HIV-1 infection in caucasian individuals bearing mutant alleles of the CCR-5 chemokine receptor gene. Nature 382:722-725.

Santoyo, G., and D. Romero. 2005. Gene conversion and concerted evolution in bacterial genomes. FEMS Microbiol Rev 29:169-183.

Schaid, D. J. 2004. Evaluating associations of haplotypes with traits. Genet Epidemiol 27:348-364.

Schimenti, J. C. 1994. Gene conversion and the evolution of gene families in mammals. Soc Gen Physiol Ser 49:85-91.

Schneider, S., D. Roessli, and L. Excoffier. 2000. Arlequin: A software for population genetics data analysis. Genetics and Biometry Laboratory, Department of Anthropology, University of Geneva.

Schork, N. J. 1997. Genetics of complex disease: approaches, problems, and solutions. Am J Respir Crit Care Med 156:S103-109.

Schroeder, S. A., D. M. Gaughan, and M. Swift. 1995. Protection against bronchial asthma by CFTR delta F508 mutation: a heterozygote advantage in cystic fibrosis. Nat Med 1:703-705.

Schwartz, K., L. Carrier, P. Guicheney, and M. Komajda. 1995. Molecular basis of familial cardiomyopathies. Circulation 91:532-540.

Seguin, B., B. Hardy, P. A. Singer, and A. S. Daar. 2008. Bidil: recontextualizing the race debate. Pharmacogenomics J.

Self, D. W., A. W. McClenahan, D. Beitner-Johnson, R. Z. Terwilliger, and E. J. Nestler. 1995. Biochemical adaptations in the mesolimbic dopamine system in response to heroin self-administration. Synapse 21:312-318.

Semba, R. D., N. Kumwenda, D. R. Hoover, T. E. Taha, T. C. Quinn, L. Mtimavalye, R. J. Biggar, R. Broadhead, P. G. Miotti, L. J. Sokoll, L. van der Hoeven, and J. D. Chiphangwi. 1999a. Human immunodeficiency virus load in breast milk, mastitis, and mother-to-child transmission of human immunodeficiency virus type 1. J Infect Dis 180:93-98.

Semba, R. D., N. Kumwenda, T. E. Taha, D. R. Hoover, Y. Lan, W. Eisinger, L. Mtimavalye, R. Broadhead, P. G. Miotti, L. Van Der Hoeven, and J. D. Chiphangwi. 1999b. Mastitis and immunological factors in breast milk of lactating women in Malawi. Clin Diagn Lab Immunol 6:671-674.

Seshadri, R., G. S. Myers, H. Tettelin, J. A. Eisen, J. F. Heidelberg, R. J. Dodson, T. M. Davidsen, R. T. DeBoy, D. E. Fouts, D. H. Haft, J. Selengut, Q. Ren, L. M. Brinkac, R. Madupu, J. Kolonay, S. A. Durkin, S. C. Daugherty, J. Shetty, A. Shvartsbeyn, E. Gebregeorgis, K. Geer, G. Tsegaye, J. Malek, B. Ayodeji, S. Shatsman, M. P. McLeod, D. Smajs, J. K. Howell, S. Pal, A. Amin, P. Vashisth, T. Z. McNeill, Q. Xiang, E. Sodergren, E. Baca, G. M. Weinstock, S. J. Norris, C. M. Fraser, and I. T. Paulsen. 2004. Comparison of the genome of the oral pathogen Treponema denticola with other spirochete genomes. Proc Natl Acad Sci U S A 101:5646-5651.

Sham, P. C., and D. Curtis. 1995. Monte Carlo tests for associations between disease and alleles at highly polymorphic loci. Ann Hum Genet 59:97-105.

Shapiro, B., A. Rambaut, and A. J. Drummond. 2006. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol Biol Evol 23:7-9.

Shapshak, P., D. M. Segal, K. A. Crandall, R. K. Fujimura, B. T. Zhang, K. Q. Xin, K. Okuda, C. K. Petito, C. Eisdorfer, and K. Goodkin. 1999. Independent evolution of HIV type 1 in different brain regions. AIDS Res Hum Retroviruses 15:811-820.

Sharp, P. M., E. Bailes, R. R. Chaudhuri, C. M. Rodenburg, M. O. Santiago, and B. H. Hahn. 2001. The origins of acquired immune deficiency syndrome viruses: where and when? Philos Trans R Soc Lond B Biol Sci 356:867-876.

Shen, Y. C., J. H. Fan, H. J. Edenberg, T. K. Li, Y. H. Cui, Y. F. Wang, C. H. Tian, C. F. Zhou, R. L. Zhou, J. Wang, Z. L. Zhao, and G. Y. Xia. 1997. Polymorphism of ADH and ALDH genes among four ethnic groups in China and effects upon the risk for alcoholism. Alcohol Clin Exp Res 21:1272-1277.

Sherry, S. T., A. R. Rogers, H. Harpending, H. Soodyall, T. Jenkins, and M. Stoneking. 1994. Mismatch distributions of mtDNA reveal recent human population expansions. Hum Biol 66:761-775.

Shibasaki, Y., D. A. Baillie, D. St Clair, and A. J. Brookes. 1995. High-resolution mapping of SNCA encoding alpha-synuclein, the non-A beta component of Alzheimer's disease amyloid precursor, to human chromosome 4q21.3-->q22 by fluorescence in situ hybridization. Cytogenet Cell Genet 71:54-55.

Shuster, D. E., M. E. Kehrli, Jr., and C. R. Baumrucker. 1995. Relationship of inflammatory cytokines, growth hormone, and insulin-like growth factor-I to reduced performance during infectious disease. Proc Soc Exp Biol Med 210:140-149.

Si-Mohamed, A., M. D. Kazatchkine, I. Heard, C. Goujon, T. Prazuck, G. Aymard, G. Cessot, Y. H. Kuo, M. C. Bernard, B. Diquet, J. E. Malkin, L. Gutmann, and L. Belec. 2000. Selection of drug-resistant variants in the female genital tract of human immunodeficiency virus type 1-infected women receiving antiretroviral therapy. J Infect Dis 182:112-122.

Simon, F., P. Mauclere, P. Roques, I. Loussert-Ajaka, M. C. Muller-Trutwin, S. Saragosti, M. C. Georges-Courbot, F. Barre-Sinoussi, and F. Brun-Vezinet. 1998. Identification of a new human immunodeficiency virus type 1 distinct from group M and group O. Nat Med 4:1032-1037.

Sinkala, M., L. Kuhn, C. Kankasa, P. Kasonde, C. Vwalika, M. Mwiya, N. Scott, K. Semrau, G. Aldrovandi, D. M. Thea, and Z. E. B. S. Group. 2007. No Benefit of Early Cessation of Breastfeeding at 4 Months on HIV-free Survival of Infants Born to HIV-infected Mothers in Zambia: The Zambia Exclusive Breastfeeding Study Conference on Retroviruses and Opportunistic Infections, San Francisco.

Slatkin, M., and W. P. Maddison. 1989. A cladistic measure of gene flow inferred from the phylogenies of alleles. Genetics 123:603-613.

Slightom, J. L., A. E. Blechl, and O. Smithies. 1980. Human fetal G gamma- and A gamma-globin genes: complete nucleotide sequences suggest that DNA can be exchanged between these duplicated genes. Cell 21:627-638.

Smit, T. K., B. J. Brew, W. Tourtellotte, S. Morgello, B. B. Gelman, and N. K. Saksena. 2004. Independent evolution of human immunodeficiency virus (HIV) drug resistance

mutations in diverse areas of the brain in HIV-infected patients, with and without dementia, on antiretroviral treatment. J Virol 78:10133-10148.

Smit, T. K., B. Wang, T. Ng, R. Osborne, B. Brew, and N. K. Saksena. 2001. Varied tropism of HIV-1 isolates derived from different regions of adult brain cortex discriminate between patients with and without AIDS dementia complex (ADC): evidence for neurotropic HIV variants. Virology 279:509-526.

Smith, J. L., N. J. David, S. Indgin, C. W. Israel, B. M. Levine, J. Justice, Jr., J. A. McCrary, 3rd, R. Medina, P. Paez, E. Santana, M. Sarkar, N. J. Schatz, M. L. Spitzer, W. O. Spitzer, and E. K. Walter. 1971. Neuro-ophthalmological study of late yaws and pinta. II. The Caracas project. Br J Vener Dis 47:226-251.

Smith, J. M. 1992. Analyzing the mosaic structure of genes. J Mol Evol 34:126-129. Smith, J. M., C. G. Dowson, and B. G. Spratt. 1991. Localized sex in bacteria. Nature 349:29-31. Smith, M. M., and L. Kuhn. 2000. Exclusive breast-feeding: does it have the potential to reduce

breast-feeding transmission of HIV-1? Nutr Rev 58:333-340. Spence, J., T. Liang, T. Foroud, D. Lo, and L. Carr. 2005. Expression profiling and QTL

analysis: a powerful complementary strategy in drug abuse research. Addict Biol 10:47-51.

Spillantini, M. G., A. Divane, and M. Goedert. 1995. Assignment of human alpha-synuclein (SNCA) and beta-synuclein (SNCB) genes to chromosomes 4q21 and 5q35. Genomics 27:379-381.

Spitzer, R. L., J. Endicott, and E. Robins. 1989. Research Diagnostic Criteria (RDC) for a selected group of psychiatric disorders. . Department of Research Assessment and Training, New York Psychiatric Institute, New York.

Staprans, S., N. Marlowe, D. Glidden, T. Novakovic-Agopian, R. M. Grant, M. Heyes, F. Aweeka, S. Deeks, and R. W. Price. 1999. Time course of cerebrospinal fluid responses to antiretroviral therapy: evidence for variable compartmentalization of infection. Aids 13:1051-1061.

Steffy, K., and F. Wong-Staal. 1991. Genetic regulation of human immunodeficiency virus. Microbiol Rev 55:193-205.

Stephens, D. S., E. R. Moxon, J. Adams, S. Altizer, J. Antonovics, S. Aral, R. Berkelman, E. Bond, J. Bull, G. Cauthen, M. M. Farley, A. Glasgow, J. W. Glasser, H. P. Katner, S. Kelley, J. Mittler, A. J. Nahmias, S. Nichol, V. Perrot, R. W. Pinner, S. Schrag, P. Small, and P. H. Thrall. 1998a. Emerging and reemerging infectious diseases: a multidisciplinary perspective. Am J Med Sci 315:64-75.

Stephens, J. C., D. E. Reich, D. B. Goldstein, H. D. Shin, M. W. Smith, M. Carrington, C. Winkler, G. A. Huttley, R. Allikmets, L. Schriml, B. Gerrard, M. Malasky, M. D. Ramos, S. Morlot, M. Tzetis, C. Oddoux, F. S. di Giovine, G. Nasioulas, D. Chandler, M. Aseev, M. Hanson, L. Kalaydjieva, D. Glavac, P. Gasparini, E. Kanavakis, M. Claustres, M. Kambouris, H. Ostrer, G. Duff, V. Baranov, H. Sibul, A. Metspalu, D. Goldman, N. Martin, D. Duffy, J. Schmidtke, X. Estivill, S. J. O'Brien, and M. Dean. 1998b. Dating the origin of the CCR5-Delta32 AIDS-resistance allele by the coalescence of haplotypes. Am J Hum Genet 62:1507-1515.

Stephens, M., and P. Donnelly. 2003. A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 73:1162-1169.

Strain, M. C., S. Letendre, S. K. Pillai, T. Russell, C. C. Ignacio, H. F. Gunthard, B. Good, D. M. Smith, S. M. Wolinsky, M. Furtado, J. Marquie-Beck, J. Durelle, I. Grant, D. D.

Richman, T. Marcotte, J. A. McCutchan, R. J. Ellis, and J. K. Wong. 2005. Genetic composition of human immunodeficiency virus type 1 in cerebrospinal fluid and blood without treatment and during failing antiretroviral therapy. J Virol 79:1772-1788.

Sullivan, S. T., U. Mandava, T. Evans-Strickfaden, J. L. Lennox, T. V. Ellerbrock, and C. E. Hart. 2005. Diversity, divergence, and evolution of cell-free human immunodeficiency virus type 1 in vaginal secretions and blood of chronically infected women: associations with immune status. J Virol 79:9799-9809.

Sun, E. S., B. J. Molini, L. K. Barrett, A. Centurion-Lara, S. A. Lukehart, and W. C. Van Voorhis. 2004. Subfamily I Treponema pallidum repeat protein family: sequence variation and immunity. Microbes Infect 6:725-737.

Surovell, T. A. 2003. Simulating coastal migration in New World colonization. Curr Anthropol 44:580-591.

Swofford, D. L. 2002. PAUP*: Phylogenetic Analysis Using Parsimony (and Other Methods). Sinauer.

Taguchi, Y., T. Koide, T. Shiroishi, and T. Yagi. 2005. Molecular evolution of cadherin-related neuronal receptor/protocadherin(alpha) (CNR/Pcdh(alpha)) gene cluster in Mus musculus subspecies. Mol Biol Evol 22:1433-1443.

Tamura, K., J. Dudley, M. Nei, and S. Kumar. 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24:1596-1599.

Tanaka, F., Y. Shiratori, O. Yokosuka, F. Imazeki, Y. Tsukada, and M. Omata. 1996. High incidence of ADH2*1/ALDH2*1 genes among Japanese alcohol dependents and patients with alcoholic liver disease. Hepatology 23:234-239.

Thea, D. M., G. Aldrovandi, C. Kankasa, P. Kasonde, W. D. Decker, K. Semrau, M. Sinkala, and L. Kuhn. 2006. Post-weaning breast milk HIV-1 viral load, blood prolactin levels and breast milk volume. Aids 20:1539-1547.

Thea, D. M., C. Vwalika, P. Kasonde, C. Kankasa, M. Sinkala, K. Semrau, E. Shutes, C. Ayash, W. Y. Tsai, G. Aldrovandi, and L. Kuhn. 2004. Issues in the design of a clinical trial with a behavioral intervention--the Zambia exclusive breast-feeding study. Control Clin Trials 25:353-365.

Thomasson, H. R., D. W. Crabb, H. J. Edenberg, and T. K. Li. 1993. Alcohol and aldehyde dehydrogenase polymorphisms and alcoholism. Behav Genet 23:131-136.

Thomasson, H. R., D. W. Crabb, H. J. Edenberg, T. K. Li, H. G. Hwu, C. C. Chen, E. K. Yeh, and S. J. Yin. 1994. Low frequency of the ADH2*2 allele among Atayal natives of Taiwan with alcohol use disorders. Alcohol Clin Exp Res 18:640-643.

Thomasson, H. R., H. J. Edenberg, D. W. Crabb, X. L. Mai, R. E. Jerome, T. K. Li, S. P. Wang, Y. T. Lin, R. B. Lu, and S. J. Yin. 1991. Alcohol and aldehyde dehydrogenase genotypes and alcoholism in Chinese men. Am J Hum Genet 48:677-681.

Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25:4876-4882.

Thompson, K. A., M. J. Churchill, P. R. Gorry, J. Sterjovski, R. B. Oelrichs, S. L. Wesselingh, and C. A. McLean. 2004. Astrocyte specific viral strains in HIV dementia. Ann Neurol 56:873-877.

Toniolo, A., C. Serra, P. G. Conaldi, F. Basolo, V. Falcone, and A. Dolei. 1995. Productive HIV-1 infection of normal human mammary epithelial cells. Aids 9:859-866.

Touchman, J. W., A. Dehejia, O. Chiba-Falek, D. E. Cabin, J. R. Schwartz, B. M. Orrison, M. H. Polymeropoulos, and R. L. Nussbaum. 2001. Human and mouse alpha-synuclein genes: comparative genomic sequence analysis and identification of a novel gene regulatory element. Genome Res 11:78-86.

Ueda, K., H. Fukushima, E. Masliah, Y. Xia, A. Iwai, M. Yoshimoto, D. A. Otero, J. Kondo, Y. Ihara, and T. Saitoh. 1993. Molecular cloning of cDNA encoding an unrecognized component of amyloid in Alzheimer disease. Proc Natl Acad Sci U S A 90:11282-11286.

Uhl, G. R., Q. R. Liu, D. Walther, J. Hess, and D. Naiman. 2001. Polysubstance abuse-vulnerability genes: genome scans for association, using 1,004 subjects and 1,494 single-nucleotide polymorphisms. Am J Hum Genet 69:1290-1300.

Ullrich, R., H. L. Schieferdecker, K. Ziegler, E. O. Riecken, and M. Zeitz. 1990. gamma delta T cells in the human intestine express surface markers of activation and are preferentially located in the epithelium. Cell Immunol 128:619-627.

Van de Peere, P., P. Lepage, A. Simonon, C. Desgranges, D. G. Hitimana, A. Bazubagira, C. Van Goethem, A. Kleinschmidt, F. Bex, K. Broliden, and et al. 1992. Biological markers associated with prolonged survival in African children maternally infected by the human immunodeficiency virus type 1. AIDS Res Hum Retroviruses 8:435-442.

Van de Perre, P., A. Simonon, D. G. Hitimana, F. Dabis, P. Msellati, B. Mukamabano, J. B. Butera, C. Van Goethem, E. Karita, and P. Lepage. 1993. Infective and anti-infective properties of breastmilk from HIV-1-infected women. Lancet 341:914-918.

Venturi, G., M. Catucci, L. Romano, P. Corsi, F. Leoncini, P. E. Valensin, and M. Zazzi. 2000. Antiretroviral resistance mutations in human immunodeficiency virus type 1 reverse transcriptase and protease from paired cerebrospinal fluid and plasma samples. J Infect Dis 181:740-745.

Verrelli, B. C., J. H. McDonald, G. Argyropoulos, G. Destro-Bisol, A. Froment, A. Drousiotou, G. Lefranc, A. N. Helal, J. Loiselet, and S. A. Tishkoff. 2002. Evidence for balancing selection from nucleotide sequence analyses of human G6PD. Am J Hum Genet 71:1112-1128.

Wagner, A. 1998. The fate of duplicated genes: loss or new function? Bioessays 20:785-788. Walker, S. J., and K. A. Grant. 2006. Peripheral blood alpha-synuclein mRNA levels are

elevated in cynomolgus monkeys that chronically self-administer alcohol. Alcohol 38:1-4.

Wall, T. L., C. Garcia-Andrade, H. R. Thomasson, L. G. Carr, and C. L. Ehlers. 1997. Alcohol dehydrogenase polymorphisms in Native Americans: identification of the ADH2*3 allele. Alcohol Alcohol 32:129-132.

Walsh, J. B. 1995. How often do duplicated genes evolve new functions? Genetics 139:421-428. Wang, S., C. M. Lewis, M. Jakobsson, S. Ramachandran, N. Ray, G. Bedoya, W. Rojas, M. V.

Parra, J. A. Molina, C. Gallo, G. Mazzotti, G. Poletti, K. Hill, A. M. Hurtado, D. Labuda, W. Klitz, R. Barrantes, M. C. Bortolini, F. M. Salzano, M. L. Petzl-Erler, L. T. Tsuneto, E. Llop, F. Rothhammer, L. Excoffier, M. W. Feldman, N. A. Rosenberg, and A. Ruiz-Linares. 2007. Genetic Variation and Population Structure in Native Americans. PLoS Genet 3:e185.

Wang, T. H., Y. K. Donaldson, R. P. Brettle, J. E. Bell, and P. Simmonds. 2001. Identification of shared populations of human immunodeficiency virus type 1 infecting microglia and tissue macrophages outside the central nervous system. J Virol 75:11686-11699.

Wei, X., J. M. Decker, S. Wang, H. Hui, J. C. Kappes, X. Wu, J. F. Salazar-Gonzalez, M. G. Salazar, J. M. Kilby, M. S. Saag, N. L. Komarova, M. A. Nowak, B. H. Hahn, P. D. Kwong, and G. M. Shaw. 2003. Antibody neutralization and escape by HIV-1. Nature 422:307-312.

West, K. M. 1974. Diabetes in American Indians and other native populations of the New World. Diabetes 23:841-855.

WHO. 2003. HIV and Infant Feeding. World Health Organization. WHO. 2007. Mother-to-child transmission of HIV. WHO. 2008a. Impact of chronic disease by World Bank income groups: High Income Countries.

World Health Organization. WHO. 2008b. Impact of chronic disease by World Bank income groups: Low Income Countries.

World Health Organization. WHO. 2006. Comprehensive HIV Prevention: 2006 Report on the Global AIDS Epidemic in W.

H. Organization, ed. Williams, J. T., H. Begleiter, B. Porjesz, H. J. Edenberg, T. Foroud, T. Reich, A. Goate, P. Van

Eerdewegh, L. Almasy, and J. Blangero. 1999. Joint multipoint linkage analysis of multivariate qualitative and quantitative traits. II. Alcoholism and event-related potentials. Am J Hum Genet 65:1148-1160.

Williams, R. C., W. C. Knowler, D. J. Pettitt, J. C. Long, D. A. Rokala, H. F. Polesky, R. A. Hackenberg, A. G. Steinberg, and P. H. Bennett. 1992. The magnitude and origin of European-American admixture in the Gila River Indian Community of Arizona: a union of genetics and demography. Am J Hum Genet 51:101-110.

Willumsen, J. F., S. M. Filteau, A. Coutsoudis, K. E. Uebel, M. L. Newell, and A. M. Tomkins. 2000. Subclinical mastitis as a risk factor for mother-infant HIV transmission. Adv Exp Med Biol 478:211-223.

Wirt, D. P., L. T. Adkins, K. H. Palkowetz, F. C. Schmalstieg, and A. S. Goldman. 1992. Activated and memory T lymphocytes in human milk. Cytometry 13:282-290.

Wise, P. H. 1976. Diabetes and associated variables in the South Australian Aboriginal. Aust NZ J Med 6:191-196.

Wong, J. K., C. C. Ignacio, F. Torriani, D. Havlir, N. J. Fitch, and D. D. Richman. 1997. In vivo compartmentalization of human immunodeficiency virus: evidence from the examination of pol sequences from autopsy tissues. J Virol 71:2059-2071.

Wooding, S., A. C. Stone, D. M. Dunn, S. Mummidi, L. B. Jorde, R. K. Weiss, S. Ahuja, and M. J. Bamshad. 2005. Contrasting effects of natural selection on human and chimpanzee CC chemokine receptor 5. Am J Hum Genet 76:291-301.

Worobey, M. 2001. A novel approach to detecting and measuring recombination: new insights into evolution in viruses, bacteria, and mitochondria. Mol Biol Evol 18:1425-1434.

Xanthou, M. 1997. Human milk cells. Acta Paediatr 86:1288-1290. Yancy, C. W., J. K. Ghali, V. M. Braman, M. L. Sabolinski, M. Worcel, W. T. Archambault, and

J. A. Franciosa. 2007. Evidence for the continued safety and tolerability of fixed-dose isosorbide dinitrate/hydralazine in patients with chronic heart failure (the extension to African-American Heart Failure Trial). Am J Cardiol 100:684-689.

Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13:555-556.

Young, T. K., J. Reading, B. Elias, and J. D. O'Neil. 2000. Type 2 diabetes mellitus in Canada's first nations: status of an epidemic in progress. Cmaj 163:561-566.

Zhang, J. R., J. M. Hardham, A. G. Barbour, and S. J. Norris. 1997. Antigenic variation in Lyme disease borreliae by promiscuous recombination of VMP-like sequence cassettes. Cell 89:275-285.

Zhang, J. R., and S. J. Norris. 1998. Genetic variation of the Borrelia burgdorferi gene vlsE involves cassette-specific, segmental gene conversion. Infect Immun 66:3698-3704.

Zhang, Q. Y., D. DeRyckere, P. Lauer, and M. Koomey. 1992. Gene conversion in Neisseria gonorrhoeae: evidence for its role in pilus antigenic variation. Proc Natl Acad Sci U S A 89:5366-5370.

Zhu, T., N. Wang, A. Carr, D. S. Nam, R. Moor-Jankowski, D. A. Cooper, and D. D. Ho. 1996. Genetic characterization of human immunodeficiency virus type 1 in blood and genital secretions: evidence for viral compartmentalization and selection during sexual transmission. J Virol 70:3098-3107.

Zinn-Justin, A., and L. Abel. 1999. Genome search for alcohol dependence using the weighted pairwise correlation linkage method: interesting findings on chromosome 4. Genet Epidemiol 17 Suppl 1:S421-426.

BIOGRAPHICAL SKETCH

I graduated from Great Valley High School in Malvern, Pennsylvania in 1997. I attended

Reed College in Portland, Oregon, from August to December 1997. I attended Millersville

University in Millersville, Pennsylvania, from January 1999 to December 2001. I graduated in

December 2001 with a Bachelor of Arts degree in anthropology. I began graduate school at the

University of Florida in August 2002 in anthropology. I received my Master of the Arts degree in

May of 2005 and Ph.D. in May 2008.

anthropological genetic analysis of human disease...

Documents

n6 and n3 fatty acid imbalance zahra farazandeh. genetic,...

chapter 17 cancer as a genetic disease

genetic theory - overviewibg€¦ · types of genetic...

mutation & genetic disease

genetic basis of disease (3)

screenig for genetic disease gene

genetic testing for huntington’s disease · 2 foreword...

7 developmental, genetic, & pediatric disease

genetic...

developmental, genetic, & pediatric disease

cystic fibrosis(genetic disease)

genetic modiﬁers and rare mendelian disease

prenatal screening for genetic disease carriers

fungal disease dynamics, genetic variation and

the genetic basis of human disease

family communication about genetic disease risk

genetic basis of disease (2)

cancer : a genetic disease

genetic epidemiology of cardiovascular disease

genetic engineering for disease resistance in plants ·...