2 isolates of mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 introduction...

32
1 The role of large sequence polymorphisms in generating genomic diversity in clinical 1 isolates of Mycobacterium tuberculosis and their utility in phylogenetic analysis. 2 3 David Alland *1 , David W. Lacher 2 , Manzour Hernando Hazbón 1 , Alifiya S. Motiwala 1 , 4 Weihong. Qi 2 , Robert D. Fleischmann 3 and Thomas S. Whittam 2 . 5 6 1. Division of Infectious Disease, Department of Medicine and the Ruy V. Lourenço Center for 7 the Study of Emerging and Re-emerging Pathogens, New Jersey Medical School, University of 8 Medicine and Dentistry of New Jersey, Newark, New Jersey. 9 2. National Food Safety and Toxicology Center, Michigan State University, East Lansing, 10 Michigan. 11 3. The Institute for Genomic Research, Rockville, Maryland. 12 13 RUNNING TITLE: LSP distribution in M. tuberculosis. 14 KEYWORDS: Tuberculosis, SNP, phylogenetic, population, sequence polymorphism, genetic 15 markers. 16 * Corresponding author: 185 South Orange Avenue, Division of Infectious Disease, University of Medicine and Dentistry of New Jersey, 185 South Orange Avenue, MSB A920C, Newark NJ 07103. E-mail: [email protected]. Phone: (973) 972-2179. Fax: (973) 972-0713. ACCEPTED Copyright © 2006, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved. J. Clin. Microbiol. doi:10.1128/JCM.02483-05 JCM Accepts, published online ahead of print on 1 November 2006 on October 14, 2020 by guest http://jcm.asm.org/ Downloaded from

Upload: others

Post on 02-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

1

The role of large sequence polymorphisms in generating genomic diversity in clinical 1

isolates of Mycobacterium tuberculosis and their utility in phylogenetic analysis. 2

3

David Alland*1

, David W. Lacher2, Manzour Hernando Hazbón

1, Alifiya S. Motiwala

1, 4

Weihong. Qi2, Robert D. Fleischmann

3 and Thomas S. Whittam

2. 5

6

1. Division of Infectious Disease, Department of Medicine and the Ruy V. Lourenço Center for 7

the Study of Emerging and Re-emerging Pathogens, New Jersey Medical School, University of 8

Medicine and Dentistry of New Jersey, Newark, New Jersey. 9

2. National Food Safety and Toxicology Center, Michigan State University, East Lansing, 10

Michigan. 11

3. The Institute for Genomic Research, Rockville, Maryland. 12

13

RUNNING TITLE: LSP distribution in M. tuberculosis. 14

KEYWORDS: Tuberculosis, SNP, phylogenetic, population, sequence polymorphism, genetic 15

markers. 16

* Corresponding author: 185 South Orange Avenue, Division of Infectious Disease, University of

Medicine and Dentistry of New Jersey, 185 South Orange Avenue, MSB A920C, Newark NJ

07103. E-mail: [email protected]. Phone: (973) 972-2179. Fax: (973) 972-0713.

ACCEPTED

Copyright © 2006, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved.J. Clin. Microbiol. doi:10.1128/JCM.02483-05 JCM Accepts, published online ahead of print on 1 November 2006

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 2: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

2

ABSTRACT 17

Mycobacterium tuberculosis strains contain different genomic insertions or deletions 18

called large sequence polymorphisms (LSPs). Distinguishing between LSPs that occur one time 19

versus ones that occur repeatedly in a genomic region may provide insights into the biological 20

roles of LSPs and identify useful phylogenetic markers. We analyzed 163 clinical M. 21

tuberculosis isolates for 17 LSPs identified in a genomic comparison of M. tuberculosis strains 22

H37Rv and CDC1551. LSPs were mapped onto a single nucleotide polymorphism (SNP)-based 23

phylogenetic tree created using nine novel SNP markers that were found to reproduce a 212 24

SNP-based phylogeny. Four “Group A LSPs” mapped to a single SNP-tree segment. Two 25

“Group B LSPs” and eleven “Group C LSPs” were inferred to have arisen independently in the 26

same genomic region either two or more than two times, respectively. None of the Group A 27

LSPs but one Group B LSP and five Group C LSPs were flanked by IS6110 sequences in the 28

references strains. PE-PPE genes were only present in Group B or C LSPs. SNP versus LSP-29

based phylogenies were also compared. We classified each isolate into 58 “LSP types” using a 30

separate LSP-based phylogenetic analysis, and mapped the LSP types onto the SNP tree. LSPs 31

often assigned isolates to the correct phylogenetic lineage, however, significant mistakes 32

occurred for 6/58 (10%) of the LSP types. In conclusion, most LSPs occur in genomic regions 33

that are prone to repeated insertion/deletion events; and were responsible for an unexpectedly 34

high degree of genomic variation in clinical M. tuberculosis. Group B and C LSPs may 35

represent polymorphisms that occur due to selective pressure and affect the phenotype of the 36

organism, while Group A LSPs are preferable phylogenetic markers. 37

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 3: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

3

INTRODUCTION 38

As pathogenic bacteria adapt to their host environments, virulence properties may change 39

through the insertion or deletion (indel) of chromosomal regions and the gain or loss of genetic 40

material (3, 4, 9, 11, 19, 25, 27, 29, 34). Mycobacterium tuberculosis is a major pathogen of 41

humans, and genomic deletions [also known as large sequence polymorphisms (LSPs) or regions 42

of difference] can also be detected in most clinical isolates of this species (16, 23, 32). Studies 43

examining the biological role of LSPs in M. tuberculosis have been inconclusive (26). LSPs 44

were demonstrated to be unique event polymorphisms (UEPs) in a study of 100 clinical M. 45

tuberculosis isolates (23). It was also possible to perform an informative analysis of a large 46

sample of clinical M. tuberculosis isolates using these LSPs as phylogenetic markers (17). UEP 47

refers to a mutation that has occurred once in the phylogeny of a species (i.e. is “unique”), is 48

irreversible and does not display homoplasy (23). The observation that most LSPs were UEPs 49

suggested that LSPs were unlikely to have an important role in disease pathogenesis, because 50

mutations that confer an evolutionary advantage to M. tuberculosis should be selected for 51

repeatedly in the evolution of the species (2). Supporting this hypothesis, LSPs were found to 52

have a possible attenuating effect on clinical disease in one retrospective study (26). A study of 53

gene expression in ten clinical M. tuberculosis isolates also demonstrated that LSPs 54

predominately encoded for genes that were variably expressed or not expressed in broth cultures 55

(18). These results suggested that LSPs do not generally involve functionally important proteins. 56

Instead, a number of investigators have assumed LSPs are selectively neutral and used them as 57

phylogenetic markers for population and evolutionary investigations (17, 23, 24). 58

Other studies have suggested that LSPs do have a critical role in M. tuberculosis 59

pathogenesis. Clinical M. tuberculosis LSPs had low consistency indices when incorporated into 60

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 4: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

4

a phylogeny composed of SNPs, LSPs and clinical parameters (16). Furthermore, investigations 61

of clinical strains have detected three apparent genomic “hot spots” for insertion of IS6110 and 62

associated chromosomal deletions (13, 14, 24, 30). Genomic analysis indicates that LSPs almost 63

always include segments of open reading frames (16, 23), although this may be due to a paucity 64

of non coding regions in the M. tuberculosis genome (7). Finally, Yang et al. (33) recently 65

showed that clinical M. tuberculosis isolates with deletions in the plcD gene (one of the known 66

deletion hot spots) are indeed phenotypically different, exhibiting a two-fold increased risk of 67

causing extrapulmonary tuberculosis. Taken together, these results indicate that some LSPs have 68

evolved repeatedly in the radiation of M. tuberculosis, and suggest that LSP-associated indels 69

provide a selective advantage to certain M. tuberculosis strains. 70

Unfortunately, the rates of indels underlying M. tuberculosis LSPs cannot be 71

conveniently measured in the laboratory. This makes it difficult to differentiate experimentally 72

between mutations that are UEPs and mutations that have a tendency to occur repeatedly. 73

Phylogenetic analysis makes an alternative approach available. Clinical strains containing a 74

particular indel can be mapped onto a phylogenetic tree and then examined to determine whether 75

or not they can be traced to a single ancestral event. Indels that have arisen independently 76

multiple times in the population may have significant biological roles. Mutations that appear to 77

have a single origin are more likely to represent UEPs that are evolutionarily neutral (2). A 78

variation of this approach was undertaken in prior LSP studies (16, 23). However, these 79

previous studies also used the LSPs themselves as markers in the phylogenetic tree construction. 80

Furthermore, the largest of these studies defined distinct LSPs quite strictly, choosing to analyze 81

LSPs separately if they had different deletion sites, even if the deletions mapped to identical or 82

overlapping genes. In investigating the biology of LSPs, we propose that it is more important to 83

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 5: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

5

categorize LSPs according to the gene or genes that are deleted (or inserted) rather than by the 84

exact location of the indel site. This is because the effect of a LSP on microbial phenotype is 85

more likely to be due to the genes that are disrupted or otherwise affected by the LSP rather than 86

the exact indel sites where the LSP occurred. Therefore, we have favored a less restricted 87

definition of LSP that is based on the presence or absence of a gene region rather than on the 88

presence or absence of a specific deletion. 89

In this report, we present a phylogenetic analysis of gene deletions found in M. 90

tuberculosis LSPs, using an “unequivocal” phylogenetic tree constructed with synonymous SNP 91

markers. We present phylogenetic evidence that many of the gene regions contained within 92

LSPs have been deleted (or possible inserted) multiple times as separate events in the history of 93

M. tuberculosis divergence, and identify several possible mechanisms for these genomic 94

changes. Our results suggest that LSPs represent an important mechanism of genetic variation in 95

M. tuberculosis and indicate that further investigations into the functional relevance of LSPs may 96

provide insights into M. tuberculosis pathogenesis and immunity. LSPs, as defined in our study, 97

that recur independently with high frequency may be precluded as phylogenetic markers. 98 ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 6: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

6

MATERIALS AND METHODS 99

Study population. The study population has been described previously (16); it consisted 100

of consecutive patients with positive cultures for M. tuberculosis identified at Montefiore 101

Medical Center in the Bronx, N.Y., between 1989 and 1996. All isolates had been typed with 102

IS6110-based restriction length polymorphism (RFLP) analysis, and a secondary typing 103

procedure if necessary (1). Of the 319 available cultures from that period, 169 of the samples 104

plus the M. tuberculosis reference strains H37Rv and CDC1551 were selected at random for 105

SNP and LSP analysis. Six clinical samples gave indeterminate SNP or LSP results enabling 106

163 clinical isolates plus H37Rv and CDC1551 to be included in the present study. The 107

demographic and clinical characteristics of this subset were similar to those of the overall study 108

population and were generally reflective of the diverse nature of New York City residents (1). 109

This subset included M. tuberculosis isolates from a broad range of ethnicities and patients from 110

at least 19 different known countries of origin. 111

LSP identification. Eighty-six LSPs larger than ten base pairs were identified by 112

comparing the genomes of M. tuberculosis strains CDC1551 and H37Rv in a previous 113

investigation (16). Seventeen LSPs were further studied: LSPs 1 through 12 were selected from 114

sequences that were present in CDC1551 but absent from H37Rv; LSPs 13 – 17 were selected 115

from sequences that were present in H37Rv but absent from CDC1551. DNA probes were then 116

prepared for one gene in each LSP by PCR (16). We limited our study to 17 LSPs because of the 117

technical complexity of studying each LSP in large numbers of M. tuberculosis samples. The 118

coordinates for each probe and primer are described in this previous work. Approximately two 119

µg of genomic DNA from the clinical M. tuberculosis isolates or CDC1551 and H37Rv were 120

suspended in 2X SSC at a final volume of 200 µl. Each sample was boiled for 5 min. and then 121

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 7: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

7

cooled on ice. A multi-slot hybridization apparatus (immunoblotter; Immunetics, MA) was 122

assembled as per the manufacturers recommendations with the modification that the cushion was 123

replaced with five pieces of dry 3 mm Whatman paper underneath one piece of 1 mm Whatman 124

paper soaked in 2X SSC. A pre-wetted Biotrans Plus nylon membrane (ICN Pharmaceuticals, 125

CA) was placed on top of the thin Whatman paper. The apparatus was assembled, and the 126

cooled genomic DNA was bound in longitudinal strips onto the membrane by rapidly loading the 127

DNA mixture into the apparatus. Bubbles were avoided inside the apparatus by loading a slight 128

excess volume of DNA solution. The apparatus was then disassembled, the membrane was 129

removed, rinsed in 2X SSC and then cross-linked with ultra-violet light. For identification of the 130

LSPs present in each DNA sample, the membrane was prehybridized for one hour in Rapid Hyb 131

buffer (Amersham, CT) at 69°C in a hybridization oven. The still wet membrane was then 132

reinserted into the multi-slot hybridization apparatus at 90°C from its previous orientation, using 133

the manufacturers cushion instead of Whatman paper (described above) to seal the apparatus. 134

Each slot was then loaded with approximately 200 µl of boiled and then rapidly ice-cooled 135

hybridization buffer containing γ-32

P-labeled probes for the 17 LSPs. The openings of the 136

apparatus were sealed with parafilm and the apparatus was incubated at 69°C with occasional 137

gentle rocking for 2 hours. The parafilm was carefully removed, unhybridized probe was sucked 138

out of each hybridization well using a vacuum attached to the wash device supplied by the 139

manufacturer, and each slot was washed (again using the vacuum-wash device) with 2X SSC. 140

The apparatus was then dissembled; the membrane was washed one more time in 2X SSC, three 141

times in 0.1X SSC at 69°C, and then exposed on film. Using this protocol, 44 different genomic 142

DNA samples could be slotted in an array consisting of 44 lines extending across the membrane. 143

Hybridizing of probes for each LSP at 90°C to this array permitted every probe to come into 144

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 8: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

8

contact with every genomic DNA sample. The presence of a particular LPS in a DNA sample 145

was determined by examining the developed autoradiogram for dark spots. An example of a LSP 146

blot has been shown previously (16). 147

SNP identification. We had previously identified six SNP markers that were sufficient 148

to classify a global M. tuberculosis collection into seven phylogenetically distinct “SNP cluster 149

groups” (SCGs) (15). For the current study, we selected a different set of nine SNP markers that 150

enabled us to further subdivide the SCGs into subgroups (SC-subgroups) for a total of seven 151

SCGs and five SC-subgroups (Table 1). All of the study samples were then tested at the nine 152

SNP loci using hairpin primer assays as described previously (22) (Table 2) and the alleles 153

determined. 154

Phylogenetic analysis. Each isolate was assigned to a SCG or SC-subgroup according to 155

the allele pattern at the nine SNP loci (Table 1), and plotted on a neighbor-joining phylogenetic 156

tree previously created by analyzing a global M. tuberculosis collection using 212 SNP markers 157

(15) (Fig. 1). The presence or absence of each LSP was scored as a binary and the isolates were 158

also classified into 58 LSP types (LSP-Ts) as defined by the distinct patterns in the present or 159

absent LSPs in each isolate. 160 ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 9: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

9

RESULTS. 161

LSPs occur repeatedly in the M. tuberculosis genome. In order to perform a 162

phylogenetic analysis of the distributions of M. tuberculosis LSPs, it was first necessary to 163

unambiguously establish the phylogeny of the 163 M. tuberculosis study isolates plus H37Rv 164

and CDC1551. Each isolate was tested for the presence of nine SNP markers, and a SCG or SC-165

subgroup was assigned to each isolate based on the pattern of its SNP alleles. The LSP and SNP 166

alleles for each isolate are shown in Supplementary Table 1. The typed isolates were then 167

plotted onto a phylogenic tree of M. tuberculosis established previously (15). The study set was 168

found to include members of all SCGs except for SCG 7 (which primarily contains 169

Mycobacterium bovis) (Fig. 1). Each M. tuberculosis SCG/SG-subgroup contained an average 170

of 18 isolates (range 0 to 43) and an average of 13 different strains (range 0 to 38) as defined by 171

the presence of distinct RFLP patterns. 172

We selected 17 M. tuberculosis LSPs from a larger set of previously identified LSPs (16) 173

to study their distribution on the strain phylogeny (Table 3). The distribution of these LSPs have 174

not been previously examined in a set of phylogenetically characterized clinical strains. Three 175

LSPs (LSP 10, 11 and 13) were located near two IS1547 elements, known to be “hotspots” for 176

IS6110 insertions (17). Each M. tuberculosis isolate was examined for the presence or absence 177

of each of the 17 LSPs by probing for an internal DNA sequence. All of the LSPs were then 178

mapped onto the phylogenetic tree. We found that the majority of LSPs did not appear to be 179

UEPs. Unlike the distribution of the selectively neutral SNPs shown in a previous report (2), 180

only four of the 17 LSPs studied (LSPs 1, 9, 13 and 16) (Fig 2A) were situated on the 181

phylogenetic tree such that their presence could be explained by a single event in a common 182

ancestor. We have called these LSPs “Group A LSPs” in subsequent discussions. Two other 183

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 10: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

10

LSPs (LSPs 12 and 14) appeared to have occurred independently at least two times (Fig. 2B). 184

We have called these LSPs “Group B LSPs. The remaining 11 LSPs (LSPs 2, 3, 4, 5, 6, 7, 8, 10, 185

11, 15 and 17) were situated on the phylogenetic tree such that they could not have arisen from a 186

single common ancestor, and must have arisen independently multiple times (Fig. 3). These 187

LSPs were renamed “Group C LSPs”. 188

The genes that corresponded to the probes for each LSP were then examined (Table 3). 189

We examined all of the genes that were deleted in each LSP as it was originally defined by the 190

CDC1551-H37Rv genomic comparisons, although some clinical strains may have smaller or 191

larger LSPs in each region. None of the Group A LSPs and only one of the Group B LSPs were 192

flanked by IS6110 elements in either CDC1551 or H37Rv, while five of the 11 Group C LSPs 193

were flanked by IS6110 elements. These results suggest that recombination between IS6110 194

elements is one of the mechanisms that generate LSPs that reoccur frequently. Indeed, we also 195

noted that the IS6110 elements adjacent to the locations of four of the Group C LSPs (LSPs 3, 4, 196

10 and 11) lack the characteristic 3 to 4 bp direct repeats indicative of recombination between 197

IS6110 elements (16). This adds further support to the hypothesis that IS6110 is an important 198

driving force for large sequence diversity in M. tuberculosis (5, 13, 24, 30, 31). PE, PE_PGRS 199

or PPE genes were not present in any of the Group A LSPs, while one of the Group B LSPs and 200

three Group C LSPs contained PPE genes. Recombination and deletion between these genes that 201

have substantial sequence similarity might represent a second mechanism for LSP generation. 202

However, the number of LSPs examined was too small to reasonably test for statistical 203

differences in PPE gene frequency among the LSP groups. Despite these two proposed 204

mechanisms for recurrent LSP generation, one of two Group B LSPs and three of 11 Group C 205

LSPs were not associated with either flanking IS6110 sequences or repetitive genes. 206

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 11: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

11

The LSPs associated with IS6110 in the reference strains did not occur at a higher 207

frequency than other LSPs. The phylogenetic analysis of each LSP (Figs. 2 and 3) suggested that 208

there were 65 independent LSP events in the 165 M. tuberculosis isolates (Table 3) (this 209

population contained many more LSPs, but a group of phylogenetically-related isolates with the 210

same LSP were considered to constitute one LSP event). Approximately one-third (6/17) of the 211

LSPs studied were associated with IS6110, and these LSPs were associated with 27/65 (42%) of 212

the independent LSPs in the population. This did not differ significantly from the approximately 213

two-thirds (11/17) of the LSPs studied that were not associated with IS6110 in the reference 214

strains. These LSPs accounted for at least 38/65 (58%) of the independent LSPs. 215

Twelve of the 17 LSPs in this study represent sequences that are absent in H37Rv but 216

present in CDC1551 [although LSP 6 appears to be present in some H37Rv isolates; and must, 217

therefore have been deleted recently in a subset of H37Rv isolates in experimental use (16)]. 218

Each of these LSPs were also found to be missing in at least one clinical isolate, demonstrating 219

that the H37Rv LSPs did not include unique deletion events that might have occurred as a 220

consequence of a prolonged in vitro culture. 221

Confirmation of LSP identification and variability within IS6110-defined clusters. 222

It was important to ensure that the results of this study were not due to artifacts of the LSP 223

identification process. Inconsistencies in detecting LSPs could make it falsely appear as if LSPs 224

were occurring repeatedly as independent events. Repeated probing of the same strain gave 225

identical LSP results, suggesting that the LSP identification process was sound. We also 226

examined strains that were identical by IS6110 RFLP analysis to determine if these closely 227

related strains contained the same LSPs. We found only six instances, in 17 clusters involving 228

66 isolates, where two isolates within a cluster did not have exactly the same LSP pattern. In 229

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 12: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

12

each of these cases, only one of the seventeen LSPs was discordant between the isolates. 230

Furthermore, all six of the mismatched LSPs were Group C LSPs (four were LSP 6, one was 231

LSP 2 and one was LSP 10). These results suggest that the small variation in LSP patterns that 232

we observed within isolates of a cluster is due to the propensity of M. tuberculosis to develop 233

independent deletions in these regions. The exact time frame of LSP generation cannot be 234

deduced from this study because the epidemiological connections among the clustered isolates 235

were not well characterized in our data set. Prior reports suggest that differences in LSP patterns 236

are not observed among RFLP-identical isolates with known epidemiological links (16). 237

However, these results do strongly suggest that different LSPs are generated at different rates. 238

Phylogenetic analysis of M. tuberculosis populations using LSP markers. LSPs 239

appear to be useful phylogenetic markers for studies of M. tuberculosis (20, 23, 28), especially 240

when the specific identity of each LSP can be confirmed by sequencing the ends of each deletion 241

(23). End-sequencing makes it possible to identify which deletions within a similar genome 242

region are, in fact, independent deletions. However, large scale sequencing of deletion sites is 243

not practical, and even PCR-based identification of specific deletion sites may be difficult if 244

LSPs of similar sizes occur near the same genomic locus. We studied the ability of the LSPs 245

identified in this study to accurately describe phylogenetic relationships among M. tuberculosis 246

isolates. Each of the 163 clinical isolates (H37Rv and CDC1551 were not included in this 247

analysis) were classified into one of 58 LSP types (LSP-Ts), based on the pattern of LSPs that 248

were present (Supplementary Table 1). Each LSP-T was then located on the SNP tree, and the 249

proximity of all of the isolates with the same LSP-T was examined. LSP-Ts that placed M. 250

tuberculosis isolates together in a manner that was consistent with the SNP tree would be 251

considered good phylogenetic assignments. LSP-Ts that conflicted with the SNP tree would 252

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 13: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

13

represent inaccurate assignments. Our results showed that LSP-Ts situated most of the M. 253

tuberculosis isolates on the same or adjoining branch of the SNP tree (Fig. 4). However, six of 254

the LSP-Ts incorrectly grouped isolates together that were more distantly related according to 255

the SNP tree (Fig. 4, LSP-Ts 3, 5, 7, 9, 14 and 15). Many of the LSP-Ts only contained a single 256

M. tuberculosis isolate. We performed a secondary analysis restricted to commonly occurring 257

LSP-Ts by eliminating LSP-Ts that contained fewer than two isolates. This analysis reduced the 258

study to 31 LSP-Ts and 137 isolates. We found that 6/31 (19%) of the LSP-Ts that contained 259

two or more isolates continued to produce important conflicts with the SNP tree. These results 260

confirm our findings with the total study sample. 261

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 14: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

14

DISCUSSION 262

This study suggests that LSPs are a substantial source of diversity within the M. 263

tuberculosis genome. While some LSPs appeared to represent rare events in the population, the 264

majority of LSPs appeared to have been generated multiple times in the divergence of M. 265

tuberculosis strains. The low frequency of Group A and B LSP events suggests that these LSPs 266

arose from random genomic events and have become associated with a particular phylogenetic 267

lineage. These LSPs may have occurred in the absence of special mechanism for generating 268

genomic change at high frequency. We suspect that these LSPs are unlikely to result in a 269

selective advantage for the organism; however, this is very difficult to test without additional 270

data. 271

Group C LSPs are much more variable and appear to have been generated by at least two 272

mechanisms. Forty-five percent of the Group C LSPs were flanked by IS6110 transposable 273

elements on at least one side of a reference strain. The presence of IS6110 in proximity to LSP 274

regions that are not present (and likely to be deleted) in other isolates suggests that 275

recombination between nearby IS6110 elements produced a deletion – creating the LSP. IS6110 276

transposition events may be advantageous, neutral or detrimental to the bacterial cell depending 277

on the genes involved. Yang et al. (33) has shown that plcD deletions (LSP 4, a group C LSP 278

flanked by IS6110 in our study) do indeed affect bacterial phenotype, in this case showing a 279

strong association with extrapulmonary tuberculosis. This work supports the hypothesis that the 280

variation associated with group C LSPs affects bacterial phenotype (although it is unclear if an 281

extrapulmonary phenotype should be considered selectively advantageous); it also provides 282

further evidence that IS6110 is a contributing force driving genetic diversity in the M. 283

tuberculosis complex. Indeed, as IS6110 may also be present in the clinical isolates at sites 284

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 15: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

15

where H37Rv and CDC1551 do not contain IS6110, this element may be playing an even more 285

pivotal role. We speculate that the group C LSPs have occurred under positive selective 286

pressure, and these deletions (LSPs) enhance transmission and other virulence features of M. 287

tuberculosis. Maurelli and colleagues have demonstrated similar events in Shigella strains, 288

where parallel loss of the cadA locus in different lineages of Shigella were found to be 289

pathoadaptive (8). An alternative hypothesis is that group C LSPs represent highly unstable 290

genomic regions that are repeatedly deleted because the genes encompassed by these LSPs are 291

nonfunctional. Under these circumstances, the repeated loss of these genes could reflect a 292

selective advantage for loss of nonfunctional DNA. However, observations in other bacteria 293

suggest that deletion of nonfunctional DNA is a progressive occurrence that begins with 294

mutation of nonfunctional genes into pseudogenes, and is only later followed by a series of 295

deletion events (21). In M. tuberculosis, there is no evidence that any of the deleted genes have 296

mutated to pseudogenes. One of the Group B and three of the Group C LSPs were in PPE genes 297

that others have speculated may be involved in immune variation and evasion (6, 10, 12). Their 298

recurrent deletion in different TB lineages is consistent with the hypothesis that these are escape 299

mutants created by silencing these gene products during the course of infection of mammalian 300

hosts. 301

Our findings do not directly contradict the work of Hirsh et al., (23) which suggested that 302

virtually all LSPs were unique evolutionary events. First, this prior investigation excluded LSPs 303

originating or terminating in PPE genes, whereas these LSPs were included in our study. 304

Second, our investigation included regions that were deleted in H37Rv relative to the genome of 305

CDC1551. Hirsh et al. only examined regions that are missing in clinical isolates relative to the 306

genome of H37Rv. Finally, we used a hybridization-based approach to identify the presence or 307

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 16: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

16

absence of genomic regions known to be encompassed by LSPs. In contrast, Hirsh et al., 308

sequenced across each end of the LSP, confirming the exact deletion sites and distinguishing 309

among similar deletion events. It is likely that a reanalysis of this previous work would 310

demonstrate that many LSPs overlap, differing only at the specific deletion sites and confirm our 311

observation that many genomic regions were likely to be deleted independently. 312

Other investigators have suggested that LSPs can provide an accurate genetic marker 313

system for molecular epidemiological and evolutionary studies of M. tuberculosis (20, 28). Our 314

results suggest that LSPs may be informative markers in situations where discrimination of 315

strains is the main objective. However, phylogenetic inference will be complicated by the 316

multiple origins and parallel evolution of many LSPs which will generate incompatibilities with 317

other phylogenetic markers such as SNP loci. The extent to which this problem can be alleviated 318

by direct sequencing LSP deletion sites requires further study. 319

In summary, this work demonstrates that LSPs are predominately genomic deletions that 320

result in an unexpected degree of genomic plasticity in clinical M. tuberculosis isolates. At least 321

one-third of the plasticity in specific genomic regions appears to involve recombination between 322

IS6110 elements in the region. The repeated evolution of some LSPs suggests that these 323

polymorphisms are a critical source of genetic variation that is adaptive, and may underlie 324

variation in virulence among TB strains; however, this is difficult to test and warrants future 325

investigational studies of pathogenicity and immunity. 326

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 17: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

17

ACHNOWLEGEMENTS. 327

This work was supported by Public Health Service grants AI-46669 and AI-49352 from the 328

National Institutes of Health. 329

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 18: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

18

Table 1. SNP set used to assign the SCGs and SC-subgroups. 330

SNP position in H37Rva

SCG 1977 54394 74092 105139 144390 232574 311613 913274 2154724

1 G G C C G G T G A

2 G G C A G G T C A

3a G G C C G G T C A

3b G G C C G G T C C

3c G G C C G T T C C

4 G G C C A T T C C

5 G A C C G G T C C

6a A A C C G G T C C

6b A A C C G G G C C

7 G G T C G G T G A

a. GenBank accession number NC_000962.331 ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 19: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

19

Table 2. Genome locations and hairpin assay primers used for the nine SNP set. 332

Position in

H37Rv

Hairpin-shaped primersa Constant primer

RHP-1977: GTTCGTC gggactgccaacgacgaac

1977

RHP-G1977A: ATTCGTC gggactgccaacgacgaat

F1977

tacggttgttgttcgactgct

FHP-54338: CGCCCA gatctggcccgggcg

54394

FHP-G54338A: TGCCCA gatctggcccgggca

R54338

gttgggtcctttggtctgattct

RHP-74073: CAGTACCGAT gcggtgaactcggtactg

74092

RHP-C74073T: TAGTACCGAT gcggtgaactcggtacta

F74073

cgacggtccgaattgcc

FHP-105129: GGGCG gcactgTcaaagagcgccc

105139

FHP-C105129A: TGGCG gcactgTcaaagagcgcca

R105129

tcccttgtgtcacttcagtttcac

RHP-144381: AGATGGG tgtcgtgcgAcccatct

144390

RHP-A144381G: GGATGGG tgtcgtgcgAcccatcc

F144381

cccgggtggtgctgatt

RHP-232686: TCGGC ccgctgtaggcgccga

232574

RHP-T232686G: GCGGC ccgctgtaggcgccgc

F232686

gattcaaacagatccgtgataccc

RHP-311729: TACGGC ccgtgCacaccgccgta

311613

RHP-T311729G: GACGGC ccgtgCacaccgccgtc

F311729

cgcccagagccgttcgt

FHP-0913183 (2): GGAGATTGG ctcgGtggacccaatctcc

913274

FHP-C0913183G (2): CGAGATTGG ctcgGtggacccaatctcg

R0913183

atcaggtcttcgatggccatg

FHP-KatG463: CGGATCT agcctttagagccagatccg 2154724

FHP-KatGR463L: AGGATCT gagcctttagagccagatcct

RKatG463

gagacagtcaatcccgatgc

a. Sequences of the 5’-end tails added to the hairpin primers and the residues corresponding to 333

the secondary mutations are shown in capital letters. 334

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 20: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

20

Table 3. LSP Groups and their attributes. 335

Group LSPa Locus

b Coordinates

c Gene

Other genes on deleted region in CDC1551

or H37Rv reference strainsd

Strain(s)

containing

adjacent

IS6110

Number of

independent

LSP deletions

1 MT0676 744149-744392 Alpha-mannosidase 1

9 MT2081

MT2082 2268479-2268702 Hyp

e/Helicase MT2080 (Hyp) and MT2080.1 (Hyp) 1

13 Rv0793

Rv0794c 886934-887397

Hyp/dihydrolipoamide

dehydrogenase 1

A

16 Rv3519 3955704-3956104 Hype 1

12 MT2423 2633331-2633746 PPE H37Rv 2 B

14 Rv2124c 2381785-2383193 Methionine synthase 2

2 MT1360 1481322-1481551 Adenylate cyclase 7

3 MT1802 198225-1982482 Transporter H37Rv and

CDC1551 5

4 MT1799 1978754-1978931 Phospholipase

MT1800 (glycosyl transferase), MT1801

(molybdopterin oxidoreductase), MT1802

(MmpL family)

H37Rv and

CDC1551 6

5 MT1812 1994163-1994437 Hype

H37Rv and

CDC1551 4

6 MT2420 2630855-2631147 Hype MT2421 (Hyp) 7

7 MT2619 2862884-2863033 Membrane lipoprotein 5

8 MT3248 3526018-3526304 PPE 4

10 MT3426 3705322-3705665 moaB

MT3427 (moaA), MT3428 (transcript.

regulator), MT3429 (Hyp), MT3430 (IS1547,

transposase)

H37Rv 6

11 MT3427 3707462-3707706 moaA MT3426 (pterin dehydratase), MT3428 -

MT3430 (see above) H37Rv 4

15 Rv3135 3501335-3501499 PPE 3

C

17 Rv3343c 3733083-3733353 PPE 6

a. Large sequence polymorphism. b. The MT prefix indicates LSPs present in CDC1551 and absent in H37Rv, the Rv prefix indicates 336

LSPs present in H37Rv but absent in CDC1551. c. Coordinates of the LSP probes. Note: probes for LSP 1 to 12 are CDC1551 337

coordinates (GenBank accession number NC_002755), and probes for LSPs 13 to 17 are H37Rv coordinates (GenBank accession 338

number NC_000962). d. Coordinates for all LSPs in the reference strains have been described previously (16). e. Hypothetical.339

ACCEPTED on O

ctober 14, 2020 by guesthttp://jcm

.asm.org/

Dow

nloaded from

Page 21: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

21

REFERENCES. 340

341

1. Alland, D., G. E. Kalkut, A. R. Moss, R. A. McAdam, J. A. Hahn, W. Bosworth, E. 342

Drucker, and B. R. Bloom. 1994. Transmission of tuberculosis in New York City. An 343

analysis by DNA fingerprinting and conventional epidemiologic methods. N Engl J Med 344

330:1710-6. 345

2. Alland, D., T. S. Whittam, M. B. Murray, M. D. Cave, M. H. Hazbon, K. Dix, M. 346

Kokoris, A. Duesterhoeft, J. A. Eisen, C. M. Fraser, and R. D. Fleischmann. 2003. 347

Modeling bacterial evolution with comparative-genome-based marker systems: 348

application to Mycobacterium tuberculosis evolution and pathogenesis. J Bacteriol 349

185:3392-9. 350

3. Baek, S. H., G. Rajashekara, G. A. Splitter, and J. P. Shapleigh. 2004. Denitrification 351

genes regulate Brucella virulence in mice. J Bacteriol 186:6025-31. 352

4. Blaser, M. J., and J. C. Atherton. 2004. Helicobacter pylori persistence: biology and 353

disease. J Clin Invest 113:321-33. 354

5. Brosch, R., W. J. Philipp, E. Stavropoulos, M. J. Colston, S. T. Cole, and S. V. 355

Gordon. 1999. Genomic analysis reveals variation between Mycobacterium tuberculosis 356

H37Rv and the attenuated M. tuberculosis H37Ra strain. Infect Immun 67:5768-74. 357

6. Choudhary, R. K., R. Pullakhandam, N. Z. Ehtesham, and S. E. Hasnain. 2004. 358

Expression and characterization of Rv2430c, a novel immunodominant antigen of 359

Mycobacterium tuberculosis. Protein Expr Purif 36:249-53. 360

7. Cole, S. T., R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V. 361

Gordon, K. Eiglmeier, S. Gas, C. E. Barry, 3rd, F. Tekaia, K. Badcock, D. Basham, 362

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 22: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

22

D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles, 363

N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, A. Krogh, J. McLean, S. Moule, L. 364

Murphy, K. Oliver, J. Osborne, M. A. Quail, M. A. Rajandream, J. Rogers, S. 365

Rutter, K. Seeger, J. Skelton, R. Squares, S. Squares, J. E. Sulston, K. Taylor, S. 366

Whitehead, and B. G. Barrell. 1998. Deciphering the biology of Mycobacterium 367

tuberculosis from the complete genome sequence. Nature 393:537-44. 368

8. Day, W. A., Jr., R. E. Fernandez, and A. T. Maurelli. 2001. Pathoadaptive mutations 369

that enhance virulence: genetic organization of the cadA regions of Shigella spp. Infect 370

Immun 69:7471-80. 371

9. de Visser, J. A., A. D. Akkermans, R. F. Hoekstra, and W. M. de Vos. 2004. 372

Insertion-sequence-mediated mutations isolated during adaptation to growth and 373

starvation in Lactococcus lactis. Genetics 168:1145-57. 374

10. Delogu, G., and M. J. Brennan. 2001. Comparative immune response to PE and 375

PE_PGRS antigens of Mycobacterium tuberculosis. Infect Immun 69:5606-11. 376

11. Ernst, R. K., D. A. D'Argenio, J. K. Ichikawa, M. G. Bangera, S. Selgrade, J. L. 377

Burns, P. Hiatt, K. McCoy, M. Brittnacher, A. Kas, D. H. Spencer, M. V. Olson, B. 378

W. Ramsey, S. Lory, and S. I. Miller. 2003. Genome mosaicism is conserved but not 379

unique in Pseudomonas aeruginosa isolates from the airways of young children with 380

cystic fibrosis. Environ Microbiol 5:1341-9. 381

12. Espitia, C., J. P. Laclette, M. Mondragon-Palomino, A. Amador, J. Campuzano, A. 382

Martens, M. Singh, R. Cicero, Y. Zhang, and C. Moreno. 1999. The PE-PGRS 383

glycine-rich proteins of Mycobacterium tuberculosis: a new family of fibronectin-binding 384

proteins? Microbiology 145 ( Pt 12):3487-95. 385

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 23: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

23

13. Fang, Z., C. Doig, D. T. Kenna, N. Smittipat, P. Palittapongarnpim, B. Watt, and K. 386

J. Forbes. 1999. IS6110-mediated deletions of wild-type chromosomes of 387

Mycobacterium tuberculosis. J Bacteriol 181:1014-20. 388

14. Fang, Z., and K. J. Forbes. 1997. A Mycobacterium tuberculosis IS6110 preferential 389

locus (ipl) for insertion into the genome. J Clin Microbiol 35:479-81. 390

15. Filliol, I., A. S. Motiwala, M. Cavatore, W. Qi, M. H. Hazbon, M. Bobadilla del 391

Valle, J. Fyfe, L. Garcia-Garcia, N. Rastogi, C. Sola, T. Zozio, M. I. Guerrero, C. I. 392

Leon, J. Crabtree, S. Angiuoli, K. D. Eisenach, R. Durmaz, M. L. Joloba, A. 393

Rendon, J. Sifuentes-Osornio, A. Ponce de Leon, M. D. Cave, R. Fleischmann, T. S. 394

Whittam, and D. Alland. 2006. Global phylogeny of Mycobacterium tuberculosis based 395

on single nucleotide polymorphism (SNP) analysis: insights into tuberculosis evolution, 396

phylogenetic accuracy of other DNA fingerprinting systems, and recommendations for a 397

minimal standard SNP set. J Bacteriol 188:759-72. 398

16. Fleischmann, R. D., D. Alland, J. A. Eisen, L. Carpenter, O. White, J. Peterson, R. 399

DeBoy, R. Dodson, M. Gwinn, D. Haft, E. Hickey, J. F. Kolonay, W. C. Nelson, L. A. 400

Umayam, M. Ermolaeva, S. L. Salzberg, A. Delcher, T. Utterback, J. Weidman, H. 401

Khouri, J. Gill, A. Mikula, W. Bishai, W. R. Jacobs Jr, Jr., J. C. Venter, and C. M. 402

Fraser. 2002. Whole-genome comparison of Mycobacterium tuberculosis clinical and 403

laboratory strains. J Bacteriol 184:5479-90. 404

17. Gagneux, S., K. DeRiemer, T. Van, M. Kato-Maeda, B. C. de Jong, S. Narayanan, 405

M. Nicol, S. Niemann, K. Kremer, M. C. Gutierrez, M. Hilty, P. C. Hopewell, and P. 406

M. Small. 2006. Variable host-pathogen compatibility in Mycobacterium tuberculosis. 407

Proc Natl Acad Sci U S A 103:2869-73. 408

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 24: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

24

18. Gao, Q., K. E. Kripke, A. J. Saldanha, W. Yan, S. Holmes, and P. M. Small. 2005. 409

Gene expression diversity among Mycobacterium tuberculosis clinical isolates. 410

Microbiology 151:5-14. 411

19. Goerke, C., S. Matias y Papenberg, S. Dasbach, K. Dietz, R. Ziebach, B. C. Kahl, 412

and C. Wolz. 2004. Increased frequency of genomic alterations in Staphylococcus aureus 413

during chronic infection is in part due to phage mobilization. J Infect Dis 189:724-34. 414

20. Goguet de la Salmoniere, Y. O., C. C. Kim, A. G. Tsolaki, A. S. Pym, M. S. Siegrist, 415

and P. M. Small. 2004. High-throughput method for detecting genomic-deletion 416

polymorphisms. J Clin Microbiol 42:2913-8. 417

21. Gomez-Valero, L., A. Latorre, and F. J. Silva. 2004. The evolutionary fate of 418

nonfunctional DNA in the bacterial endosymbiont Buchnera aphidicola. Mol Biol Evol 419

21:2172-81. 420

22. Hazbon, M. H., and D. Alland. 2004. Hairpin primers for simplified single-nucleotide 421

polymorphism analysis of Mycobacterium tuberculosis and other organisms. J Clin 422

Microbiol 42:1236-42. 423

23. Hirsh, A. E., A. G. Tsolaki, K. DeRiemer, M. W. Feldman, and P. M. Small. 2004. 424

Stable association between strains of Mycobacterium tuberculosis and their human host 425

populations. Proc Natl Acad Sci U S A 101:4871-6. 426

24. Ho, T. B., B. D. Robertson, G. M. Taylor, R. J. Shaw, and D. B. Young. 2000. 427

Comparison of Mycobacterium tuberculosis genomes reveals frequent deletions in a 20 428

kb variable region in clinical isolates. Yeast 17:272-82. 429

25. Israel, D. A., N. Salama, C. N. Arnold, S. F. Moss, T. Ando, H. P. Wirth, K. T. 430

Tham, M. Camorlinga, M. J. Blaser, S. Falkow, and R. M. Peek, Jr. 2001. 431

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 25: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

25

Helicobacter pylori strain-specific differences in genetic content, identified by 432

microarray, influence host inflammatory responses. J Clin Invest 107:611-20. 433

26. Kato-Maeda, M., J. T. Rhee, T. R. Gingeras, H. Salamon, J. Drenkow, N. Smittipat, 434

and P. M. Small. 2001. Comparing genomes within the species Mycobacterium 435

tuberculosis. Genome Res 11:547-54. 436

27. Kuipers, E. J., D. A. Israel, J. G. Kusters, M. M. Gerrits, J. Weel, A. van Der Ende, 437

R. W. van Der Hulst, H. P. Wirth, J. Hook-Nikanne, S. A. Thompson, and M. J. 438

Blaser. 2000. Quasispecies development of Helicobacter pylori observed in paired 439

isolates obtained years apart from the same host. J Infect Dis 181:273-82. 440

28. Mostowy, S., D. Cousins, J. Brinkman, A. Aranaz, and M. A. Behr. 2002. Genomic 441

deletions suggest a phylogeny for the Mycobacterium tuberculosis complex. J Infect Dis 442

186:74-80. 443

29. Pearson, B. M., C. Pin, J. Wright, K. I'Anson, T. Humphrey, and J. M. Wells. 2003. 444

Comparative genome analysis of Campylobacter jejuni using whole genome DNA 445

microarrays. FEBS Lett 554:224-30. 446

30. Sampson, S. L., M. Richardson, P. D. Van Helden, and R. M. Warren. 2004. IS6110-447

mediated deletion polymorphism in isogenic strains of Mycobacterium tuberculosis. J 448

Clin Microbiol 42:895-8. 449

31. Sreevatsan, S., X. Pan, K. E. Stockbauer, N. D. Connell, B. N. Kreiswirth, T. S. 450

Whittam, and J. M. Musser. 1997. Restricted structural gene polymorphism in the 451

Mycobacterium tuberculosis complex indicates evolutionarily recent global 452

dissemination. Proc Natl Acad Sci U S A 94:9869-74. 453

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 26: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

26

32. Tsolaki, A. G., A. E. Hirsh, K. DeRiemer, J. A. Enciso, M. Z. Wong, M. Hannan, Y. 454

O. Goguet de la Salmoniere, K. Aman, M. Kato-Maeda, and P. M. Small. 2004. 455

Functional and evolutionary genomics of Mycobacterium tuberculosis: insights from 456

genomic deletions in 100 strains. Proc Natl Acad Sci U S A 101:4865-70. 457

33. Yang, Z., D. Yang, Y. Kong, L. Zhang, C. F. Marrs, B. Foxman, J. H. Bates, F. 458

Wilson, and M. D. Cave. 2005. Clinical relevance of Mycobacterium tuberculosis plcD 459

gene mutations. Am J Respir Crit Care Med 171:1436-42. 460

34. Zhong, S., A. Khodursky, D. E. Dykhuizen, and A. M. Dean. 2004. Evolutionary 461

genomics of ecological specialization. Proc Natl Acad Sci U S A 101:11719-24. 462

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 27: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

27

FIGURE LEGENDS. 463

Figure 1. Phylogeny of the M. tuberculosis study isolates. M. tuberculosis isolates were 464

assigned to each SCG or SC-subgroup based on SNP alleles at nine loci. The SCG and SC-465

subgroup designations had been defined in a previous work (15). The number of study strains 466

and the number of clinical isolates, as defined by identical RFLP patterns, are shown for each 467

location on the tree. The locations of the three M. tuberculosis reference strains (H37Rv, 468

CDC1551, and strain 210) and M. bovis strain (M. bovis AF 2122/97) with sequenced genomes 469

are also shown. 470

Figure 2. Distribution of Group A and Group B LSPs on the SNP tree. M. tuberculosis 471

strains containing each designated LSP are indicated next to each tree branch. Numbers refer to 472

the total number of strains with the indicated LSP / the total number of isolates with the indicated 473

LSP. Thick lines are used to indicate the phylogenetic location of a hypothetical common 474

ancestor in which the LSP first occurred and its progeny. A: All Group A LSPs in the study. B: 475

All Group B LSPs in the study. The location of the SCG and SC-subgroups of these trees as well 476

as the total numbers of strains and isolates present in each SCG and SC-subgroups can be found 477

in Fig. 1. 478

Figure 3. Distribution of Group C LSPs on the SNP tree. M. tuberculosis strains containing 479

Group C LSPs in this study are shown. Numbers refer to the total number of strains with the 480

indicated LSP / the total number of isolates with the indicated LSP. Thick lines are used to 481

indicate the phylogenetic location of a hypothetical common ancestor in which the LSP first 482

occurred and its progeny. The location of the SCG and SC-subgroups of these trees as well as the 483

total numbers of strains and isolates present in each SCG and SC-subgroups can be found in Fig. 484

1. 485

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 28: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

28

Figure 4. Location of LSP-Ts on the SNP tree. The location of clinical M. tuberculosis strains 486

identified by LSP-T are shown relative to the location of each SCG and SC-subgroup on the SNP 487

tree. Colored LSP-Ts and connecting lines indicate LSP-Ts that are present on multiple SNP tree 488

branches. Tree not drawn to scale. 489

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 29: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

SCG2 (Strain 210)

12 strains

24 isolates

SCG6a

12 strains

12 isolates

SCG5

38 strains

43 isolates

SCG7 (M. bovis)

(0 isolates)

SCG4 (CDC1551)

6 strains

7 isolates

SCG1

9 strains

11 isolates

SCG3a

1 isolate

SCG3b

27 strains

29 isolatesSCG3c

8 strains

34 isolates

SCG6b (H37Rv)

4 strains

4 isolates

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 30: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

11/11

4/4

LSP912/12

4/4

LSP1

A 1/1

3/3

LSP12

B

LSP13

1/1

LSP16

1/11/1

2/3

LSP14

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 31: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

3/3

3/3

6/7

1/1

4/4

LSP11 1/1

5/6 5/6

2/2

1/1

1/1

LSP17

1/1 6/7 9/11

26/28

8/34

LSP15

1/1

2/2

3/3

22/25

7/8

1/1

4/4

LSP10 LSP8 12/12

37/42

1/1

1/2

4/4

6/17

1/1

9/10 1/1 8/8

9/9

2/19

LSP6

LSP7

12/12

37/42 6/6

1/1 1/1

4/4

12/24

6/6

15/16

1/1 1/1

4/4

LSP5 1/1

14/17 9/11

3/4

3/20

2/2

LSP2

1/1 12/24

4/4

24/27

1/1 1/1

LSP3

1/1

1/1

11/23

3/3

23/26

1/1

LSP4

1/1

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from

Page 32: 2 isolates of Mycobacterium tuberculosis and their utility ... · 11/1/2006  · 3 38 INTRODUCTION 39 As pathogenic bacteria adapt to their host environments, virulence properties

SCG6a (LSP-T:46,47,48,49,50,51,52,53,54)

SCG6b (LSP-T:53,55,56,57)

SCG5 (LSP-T:14,15,25,26,27,28,29,30,31,32,33,34,35,36,

37,38,39,40,41,42,43,44,45)

SCG3c (LSP-T:5,6,7,8,9,10)

SCG4 (LSP-T:1,2,3,4,5)

SCG3b (LSP-T:3,5,7,10,11,12,13,14,15,16)

SCG3a (LSP-T:20)

SCG2 (LSP-T:17,18,19,21,24,58)

SCG7

SCG1 (LSP-T:7,9,22,23)

ACCEPTED

on October 14, 2020 by guest

http://jcm.asm

.org/D

ownloaded from