comparative analysis of genome methylation in thermotogae isolates from deep-sea hydrothermal vents

1
Introduction The phylum Thermotogae is characterized by the presence of extensive horizontal gene transfer (HGT). Highly similar genes are shared between genomes of different Thermotogae genera, other phyla (Firmicutes) or other kingdoms such as the Archaea [1]. Many of these organisms proliferate in hot extreme environments such as oil fields and hydrothermal vents. How HGT funcons in these ecosystems is unclear, but phages might play a role as a transfer agent of genec material. Thermotogae genomes contain CRISPR repeats, which are part of the defence machinery against phages. Another defence mechanism against phages is the restricon modificaons system and genes related to this are found as well in several Ther- motogae genomes. The restricon modificaon system uses methyltransferase proteins to methylate bases of the DNA strand. Under a phage aack, this system detects the non-meth- ylated foreign DNA and ulizes restricon enzymes to degrade invading DNA. With the advancement of single-molecule, real-me (SMRT) sequencing it has become possible to detect- ed N4-methylcytosine (m4C) and N6-methyladenine (m6A) bases in bacterial genomes. Here we use SMRT genome sequencing to compare four Thermotogae isolates from deep-sea hydrothermal vents and compare their defence system set-up, including CRISPRs and base modificaons, in order to understand the probable response to invading DNA. Comparative analysis of genome methylation in Thermotogae isolates from deep-sea hydrothermal vents Thomas Haverkamp 1 ([email protected]), Lossouarn J 2, Geslin C 2 , Nesbø CL 1,3 Affili Phylogenetic distance of Thermotogae isolates Base modifications and DNA motifs Table 1. Modificaon and Mof analysis for four Thermotogae strains. Strain (contigs) Motif Modification Type Motifs in Genome Fraction methylated motifs Mean score Mean IPD # Ratio Mean Coverage Marinitoga sp. 1137 (10) TANCAY m6A 9852 0,95 104,2 4,42 82,4 GTNNAC m6A 3532 0,92 104,5 4,64 81,8 T. melanesiensis 431 (3) GATC m6A 5446 0,99 72,6 4,60 48,3 RTAYNNNNNNTNNCG m6A 520 0,95 70,6 5,13 48,0 CGNNANNNNNNRTAY m6A 520 0,94 66,9 4,75 48,5 CCGG m4C 2968 0,71 45,8 3,63 49,1 CGCC m4C 2462 0,62 44,6 3,07 52,3 Thermosipho sp. 1063 (3) $ Not Clustered 3584808 0,09 37,4 81,2 Thermosipho sp. 1070 (1) $ CNNNTNCNNTAANATNG modified base 72 0,50 41,3 2,60 39,9 Modificaon and mof analysis was performed using the RS modificaon and mof analysis v1 pipeline [2]. This pipeline maps the SMRT subreads to the HGAP assembled genome and determines the inter pulse du- raon for each base and the likelihood that a specific base is modified. m6A methylated bases were iden- fied along the enre chromosome sequences of Thermosipho melanesiensis 431 and Marinitoga sp. 1137. # ) IPD: Inter Pulse Density $ ) Non-significant results Identification of prophage sequences Figure 3. Chromosome maps for Thermosipho melanesiensis 431 and Marinitoga sp. 1137. Rings from inside to outside indicate: 1) Chromosome posion; 2) GC content; 3) GC skew; 4) rRNA operon, 5) annotated hypothe- cal genes; 6) prophage regions. Note: the chromosome from Marinitoga sp. 1137 is based on Mauve ordered and concatenated congs. Prophages were idenfied using the PHAST website [6]. rRNA genes, hypothecal genes, and genes found in prophage assigned genomic regions were extracted using CLC workbench. Each cat- egory of genes was then matched using BlastN against the chromosome / congs and visualized with BRIG. The chromosome sequences of strains 1060 and 1070 were analysed using PHAST and were found not to contain any prophage regions. Conclusions The analysis of four different Thermotogae genomes and there methylaon paern shows a clear difference between the methylated and non-methylated genomes. The methylated ge- nomes of T. melanesiensis 431 and Marinitoga sp. 1137 contain prophage elements. Second, the methylated genomes have coding genes for the restricon modificaon system as well, which is know as another phage defence system. Although all four genomes contained CRISPR regions and CRISPR associated genes, the composion of the CRISPR associated genes was dif- ferent between the non-methylated and methylated genomes. At this stage it is unclear how these defence system differences affect the populaons of these bacteria and how it supports / suppresses the process of HGT. References 1. Zhaxybayeva et al., 2009. PNAS 106: 5865 - 5870. 2. Methylome Analysis Technical Note: http://tinyurl.com/mfo74u4 3. Makarova et al. 2011. J. Bacteriology 193: 6039 - 6056 4. Grissa et al., 2007. Nucleic Acids Res., 35: W52–W57. 5. Makarova et al., 2011. Nature Rev. Microbiology 9: 467 - 477 6. Zhou et al., 2011. Nucleic Acids Res. : 1 - 6 Affiliations 1. Centre for Ecological and Evolutionary Synthesis, Dept. of Biosciences, University of Oslo, Oslo, Norway. 2. Laboratory of Microbiology of Extreme Environments (LMEE), UMR 6197/CNRS/UBO IUEM, Plouzane, France 3. Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada Characteristics of CRISPR elements Table 2: CRISPR repeat elements and typing of CRISPR associated genes Strain CRISPR locus CRISPR type # Positions (bp) Spacers (n) Typical repeat sequence Tm 431 Crispr-1 - 137986 - 138686 9 GTTTCTACCTTACCTTGGAGGAATTGAAAC Crispr-2 I-B 360035 - 360533 7 ATTTCAATTCCTCCAAGGTAAGGTAAAAAC Crispr-3 - 754395 - 758275 55 ATTTCTATTCCTCATAGGTAGATTCTAAAC Crispr-4 III-B 1638809 - 1639894 15 GTTTAGAATCTACCTATGAGGAATGGAAAC Crispr-5 III-B 1651157 -1651956 12 GTTTCCATTCCTCATAGGTAGATTCTAAAC Ts 1063 Crispr-1 - 125563 - 126130 8 GTTTCCATTCCTCATAGGTATGTTCTAAAC Crispr-2 - 306496 - 310808 60 GTTAAAAAACCTAATTCCATAAATGGAATTCAAAC Crispr-3 - 624605 - 625388 11 GTTTAGAACATACCTATGAGGAATGGAAAC Crispr-4 - 909396 - 911235 27 GTTTCCATTCCTCATAGGTATGTTCTAAAC TS 1070 Crispr-1 - 125564 - 126342 11 GTTTCCATTCCTCATAGGTATGTTCTAAAC Crispr-2 - 306632 - 310795 57 GTTAAAAAACCTAATTCCATAAAATGGAATTCAAAC Crispr-3 - 625089 - 626267 11 GTTTAGAACATACCTATGAGGAATGGAAAC Crispr-4 - 923880 - 925386 22 GTTTCCATTCCTCATAGGTATGTTCTAAAC Mar 1137 C23 * - Crispr-1 - 336790 - 337823 13 GTTTCTATCTCTTTCAGAGAGCAGTTATATTCGGAT C23 - Crispr-2 III-B 349422 - 351458 26 GTTTCTATCTCTTTCAGAGAGTAGTGATATTCGGAT C23 - Crispr-3 III-B 367928 - 370483 33 GTTTCTATCTCTTTCAGAGAGTAGTGATATTCGGAT C23 - Crispr-4 I-B 471993 - 472356 5 ATTTACATTCCAATATGGATTATTAAAGAC C26- Possible Crispr-5 $ - 285006 - 285093 1 TTTGTAATTTTACCTTGGACACTCTGCGAG CRISPRs were idenfied using CRISPRfinder [4]. To determine the CRISPR type we first screened the genome region around the CRISPR for the presence of CRISPR-associated genes and compared the gene order with the propose CRIS- PR types in [5]. We only find evidence for the presence of CRISPR types I-B and III-B in two of the isolates, Tm 431 and Mar 1137. For the genomes of Ts 1063 and Ts 1070 we did not conclusively idenfy the CRISPR system based on the classificaon scheme used [5), indicang that the CRISPR operang mechanism could be different. # : CRISPR type was idenfied using the classificaon scheme as proposed by [5]. A dash means that a classificaon could not be established. * : Cong ID for the Marinitoga 1137 genome. $: This repeat sequence was idenfied by CRISPRfinder and indicated as a possible CRISPR repeat. Acknowledgements We thank the Norwegian Sequencing Platform at the University of Oslo for sequencing our samples and sup- port with the bioinformatic analysis. Defense system genes Figure 2. Phage defence systems genes in four Thermotogae isolates. To idenfy defence system proteins in each genome we used a curated database containing 132 COGs present in different prokaryo- c defence systems [3]. For each genome we used BlastP (blast+ v2.2.28) to match all protein sequences against the defence systems COGS database, and extracted only those sequences with an e-value below 1.0e -20 . To idenfy the restricon modificaon system genes in the total data set, we checked the REBASE da- tabase annotaons for T. melanesiensis BI429 and Marinitoga piezophila KA3 for reference. CRISPRs genes were idenfied by their annotaons (e.g. CRISPR-associat- ed protein cas 1). Kosmotoga olearia TBF 19.5.1 Petrotoga mobilis SJ95 Marinitoga sp. 1137 Marinitoga piezohila KA3 100 98 Thermosipho sp. 1223 Thermosipho sp. 1074 Thermosipho sp. 1063 Thermosipho sp. 1070 77 T. africanus H17ap60334 T. africanus Ob7 99 99 T. melanesiensis 487 T. melanesiensis 430 T. melanesiensis 432 T. melanesiensis 434 74 T. melanesiensis 433 T. melanesiensis 431 T. melanesiensis BI429 76 96 0.1 Figure 1. Maximum likelihood phylogeny of the DNA-directed RNA polymerase beta subunit (rpoB) gene sequences from Thermotogae species used in our study and reference strains (425-491 bp). A 500 base pair alignment was used to constructed the tree with PhyML (Seaview v4) with the GTR model and 1000 replicates. Numbers at the nodes indicate bootstrap values (only nodes above 70 % are shown). Dots mark strains used in the present analysis. Green: closed genomes; red: congs only. Black squares are ref- erence genomes. Download your PDF

Upload: thomas-haverkamp

Post on 14-Dec-2014

483 views

Category:

Science


3 download

DESCRIPTION

The phylum Thermotogae is characterized by the presence of extensive horizontal gene transfer (HGT). Highly similar genes are shared between genomes of different Thermotogae genera, other phyla (Firmicutes) or other kingdoms such as the Archaea [1]. Many of these organisms proliferate in hot extreme environments such as oil fields and hydrothermal vents. How HGT functions in these ecosystems is unclear, but phages might play a role as a transfer agent of genetic material. Thermotogae genomes contain CRISPR repeats, which are part of the defence machinery against phages. Another defence mechanism against phages is the restriction modifications system and genes related to this are found as well in several Ther- motogae genomes. The restriction modification system uses methyltransferase proteins to methylate bases of the DNA strand. Under a phage attack, this system detects the non-meth- ylated foreign DNA and utilizes restriction enzymes to degrade invading DNA. With the advancement of single-molecule, real-time (SMRT) sequencing it has become possible to detect- ed N4-methylcytosine (m4C) and N6-methyladenine (m6A) bases in bacterial genomes. Here we use SMRT genome sequencing to compare four Thermotogae isolates from deep-sea hydrothermal vents and compare their defence system set-up, including CRISPRs and base modifications, in order to understand the probable response to invading DNA. 

TRANSCRIPT

Page 1: Comparative analysis of genome methylation in Thermotogae isolates from deep-sea hydrothermal vents

IntroductionThe phylum Thermotogae is characterized by the presence of extensive horizontal gene transfer (HGT). Highly similar genes are shared between genomes of different Thermotogae genera, other phyla (Firmicutes) or other kingdoms such as the Archaea [1]. Many of these organisms proliferate in hot extreme environments such as oil fields and hydrothermal vents. How HGT functions in these ecosystems is unclear, but phages might play a role as a transfer agent of genetic material. Thermotogae genomes contain CRISPR repeats, which are part of the defence machinery against phages. Another defence mechanism against phages is the restriction modifications system and genes related to this are found as well in several Ther-motogae genomes. The restriction modification system uses methyltransferase proteins to methylate bases of the DNA strand. Under a phage attack, this system detects the non-meth-ylated foreign DNA and utilizes restriction enzymes to degrade invading DNA. With the advancement of single-molecule, real-time (SMRT) sequencing it has become possible to detect-ed N4-methylcytosine (m4C) and N6-methyladenine (m6A) bases in bacterial genomes. Here we use SMRT genome sequencing to compare four Thermotogae isolates from deep-sea hydrothermal vents and compare their defence system set-up, including CRISPRs and base modifications, in order to understand the probable response to invading DNA.

Comparative analysis of genome methylation in Thermotogae

isolates from deep-sea hydrothermal ventsThomas Haverkamp1 ([email protected]), Lossouarn J2, Geslin C2, Nesbø CL1,3

Affili

Phylogenetic distance of Thermotogae isolates

Base modifications and DNA motifsTable 1. Modification and Motif analysis for four Thermotogae strains.

Strain (contigs)

Motif Modification Type

Motifs in

Genome

Fraction methylated

motifs

Mean score

Mean IPD#

Ratio

Mean Coverage

Marinitoga sp. 1137 (10)

TANCAY m6A 9852 0,95 104,2 4,42 82,4

GTNNAC m6A 3532 0,92 104,5 4,64 81,8

T. melanesiensis 431(3)

GATC m6A 5446 0,99 72,6 4,60 48,3

RTAYNNNNNNTNNCG m6A 520 0,95 70,6 5,13 48,0

CGNNANNNNNNRTAY m6A 520 0,94 66,9 4,75 48,5

CCGG m4C 2968 0,71 45,8 3,63 49,1

CGCC m4C 2462 0,62 44,6 3,07 52,3

Thermosipho sp. 1063 (3)$ Not Clustered – 3584808 0,09 37,4 – 81,2

Thermosipho sp. 1070 (1)$ CNNNTNCNNTAANATNG modified base 72 0,50 41,3 2,60 39,9

Modification and motif analysis was performed using the RS modification and motif analysis v1 pipeline [2]. This pipeline maps the SMRT subreads to the HGAP assembled genome and determines the inter pulse du-ration for each base and the likelihood that a specific base is modified. m6A methylated bases were identi-fied along the entire chromosome sequences of Thermosipho melanesiensis 431 and Marinitoga sp. 1137.

#) IPD: Inter Pulse Density$) Non-significant results

Identification of prophage sequences

Figure 3. Chromosome maps for Thermosipho melanesiensis 431 and Marinitoga sp. 1137. Rings from inside to outside indicate: 1) Chromosome position; 2) GC content; 3) GC skew; 4) rRNA operon, 5) annotated hypotheti-cal genes; 6) prophage regions. Note: the chromosome from Marinitoga sp. 1137 is based on Mauve ordered and concatenated contigs. Prophages were identified using the PHAST website [6]. rRNA genes, hypothetical genes, and genes found in prophage assigned genomic regions were extracted using CLC workbench. Each cat-egory of genes was then matched using BlastN against the chromosome / contigs and visualized with BRIG. The chromosome sequences of strains 1060 and 1070 were analysed using PHAST and were found not to contain any prophage regions.

ConclusionsThe analysis of four different Thermotogae genomes and there methylation pattern shows a clear difference between the methylated and non-methylated genomes. The methylated ge-nomes of T. melanesiensis 431 and Marinitoga sp. 1137 contain prophage elements. Second, the methylated genomes have coding genes for the restriction modification system as well, which is know as another phage defence system. Although all four genomes contained CRISPR regions and CRISPR associated genes, the composition of the CRISPR associated genes was dif-ferent between the non-methylated and methylated genomes. At this stage it is unclear how these defence system differences affect the populations of these bacteria and how it supports / suppresses the process of HGT.

References

1. Zhaxybayeva et al., 2009. PNAS 106: 5865 - 5870.2. Methylome Analysis Technical Note: http://tinyurl.com/mfo74u4 3. Makarova et al. 2011. J. Bacteriology 193: 6039 - 60564. Grissa et al., 2007. Nucleic Acids Res., 35: W52–W57.5. Makarova et al., 2011. Nature Rev. Microbiology 9: 467 - 4776. Zhou et al., 2011. Nucleic Acids Res. : 1 - 6

Affiliations

1. Centre for Ecological and Evolutionary Synthesis, Dept. of Biosciences, University of Oslo, Oslo, Norway.

2. Laboratory of Microbiology of Extreme Environments (LMEE), UMR 6197/CNRS/UBO IUEM, Plouzane, France

3. Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada

Characteristics of CRISPR elementsTable 2: CRISPR repeat elements and typing of CRISPR associated genesStrain CRISPR locus CRISPR

type#

Positions(bp)

Spacers (n)

Typical repeat sequence

Tm 431 Crispr-1 - 137986 - 138686 9 GTTTCTACCTTACCTTGGAGGAATTGAAAC

Crispr-2 I-B 360035 - 360533 7 ATTTCAATTCCTCCAAGGTAAGGTAAAAAC

Crispr-3 - 754395 - 758275 55 ATTTCTATTCCTCATAGGTAGATTCTAAAC

Crispr-4 III-B 1638809 - 1639894 15 GTTTAGAATCTACCTATGAGGAATGGAAAC

Crispr-5 III-B 1651157 -1651956 12 GTTTCCATTCCTCATAGGTAGATTCTAAAC

Ts 1063 Crispr-1 - 125563 - 126130 8 GTTTCCATTCCTCATAGGTATGTTCTAAAC

Crispr-2 - 306496 - 310808 60 GTTAAAAAACCTAATTCCATAAATGGAATTCAAAC

Crispr-3 - 624605 - 625388 11 GTTTAGAACATACCTATGAGGAATGGAAAC

Crispr-4 - 909396 - 911235 27 GTTTCCATTCCTCATAGGTATGTTCTAAAC

TS 1070 Crispr-1 - 125564 - 126342 11 GTTTCCATTCCTCATAGGTATGTTCTAAAC

Crispr-2 - 306632 - 310795 57 GTTAAAAAACCTAATTCCATAAAATGGAATTCAAAC

Crispr-3 - 625089 - 626267 11 GTTTAGAACATACCTATGAGGAATGGAAAC

Crispr-4 - 923880 - 925386 22 GTTTCCATTCCTCATAGGTATGTTCTAAAC

Mar 1137 C23* - Crispr-1 - 336790 - 337823 13 GTTTCTATCTCTTTCAGAGAGCAGTTATATTCGGAT

C23 - Crispr-2 III-B 349422 - 351458 26 GTTTCTATCTCTTTCAGAGAGTAGTGATATTCGGAT

C23 - Crispr-3 III-B 367928 - 370483 33 GTTTCTATCTCTTTCAGAGAGTAGTGATATTCGGAT

C23 - Crispr-4 I-B 471993 - 472356 5 ATTTACATTCCAATATGGATTATTAAAGACC26- Possible

Crispr-5$ - 285006 - 285093 1 TTTGTAATTTTACCTTGGACACTCTGCGAG

CRISPRs were identified using CRISPRfinder [4]. To determine the CRISPR type we first screened the genome region around the CRISPR for the presence of CRISPR-associated genes and compared the gene order with the propose CRIS-PR types in [5]. We only find evidence for the presence of CRISPR types I-B and III-B in two of the isolates, Tm 431 and Mar 1137. For the genomes of Ts 1063 and Ts 1070 we did not conclusively identify the CRISPR system based on the classification scheme used [5), indicating that the CRISPR operating mechanism could be different.

#: CRISPR type was identified using the classification scheme as proposed by [5]. A dash means that a classification could not be established.*: Contig ID for the Marinitoga 1137 genome.

$: This repeat sequence was identified by CRISPRfinder and indicated as a possible CRISPR repeat.

Acknowledgements

We thank the Norwegian Sequencing Platform at the University of Oslo for sequencing our samples and sup-

port with the bioinformatic analysis.

Defense system genesFigure 2. Phage defence systems genes in four Thermotogae isolates. To identify defence system proteins in each genome we used a curated database containing 132 COGs present in different prokaryo-tic defence systems [3]. For each genome we used BlastP (blast+ v2.2.28) to match all protein sequences against the defence systems COGS database, and extracted only those sequences with an e-value below 1.0e-20. To identify the restriction modification system genes in the total data set, we checked the REBASE da-tabase annotations for T. melanesiensis BI429 and Marinitoga piezophila KA3 for reference. CRISPRs genes were identified by their annotations (e.g. CRISPR-associat-ed protein cas 1).

Kosmotoga olearia TBF 19.5.1

Petrotoga mobilis SJ95

Marinitoga sp. 1137

Marinitoga piezohila KA310098

Thermosipho sp. 1223

Thermosipho sp. 1074

Thermosipho sp. 1063

Thermosipho sp. 107077

T. africanus H17ap60334

T. africanus Ob7

99

99

T. melanesiensis 487

T. melanesiensis 430

T. melanesiensis 432

T. melanesiensis 43474

T. melanesiensis 433

T. melanesiensis 431

T. melanesiensis BI42976

96

0.1

Figure 1. Maximum likelihood phylogeny of the DNA-directed RNA polymerase beta subunit (rpoB) gene sequences from Thermotogae species used in our study and reference strains (425-491 bp). A 500 base pair alignment was used to constructed the tree with PhyML (Seaview v4) with the GTR model and 1000 replicates. Numbers at the nodes indicate bootstrap values (only nodes above 70 % are shown). Dots mark strains used in the present analysis. Green: closed genomes; red: contigs only. Black squares are ref-erence genomes.

Download your PDF