materials and methods · web viewests [11], using blastn with relaxed parameters (reward 5, penalty...

Supporting information:A. Materials and Methods.....................................................................................................2

1. Ethics statement...............................................................................................................22. Bronchoalveolar lavage fluid specimens.........................................................................23. Enrichment in P. jirovecii cells by immuno-precipitation...............................................24. DNA extraction and amplification...................................................................................35. BALF specimens screening for high proportion of P. jirovecii DNA.............................36. Investigation of the presence or absence of another Pneumocystis species....................47. Investigation of the presence or absence of another fungal species................................58. High throughput sequencing............................................................................................69. Filtering and P. jirovecii genome assembly.....................................................................610. Transcriptome assembly and annotation..........................................................................811. Gene predictions and functional annotations...................................................................812. Phylogeny.........................................................................................................................913. Search for missing genes................................................................................................1014. Recovery, assembly, and annotation of the ribosomal RNA unit..................................1015. Recovery, assembly, and annotation of mitochondrial genome....................................11

B. Data access.......................................................................................................................11

C. References for supplementary information..................................................................12

D. Supplementary figure.....................................................................................................15

E. Supplementary tables......................................................................................................16

1

A. Materials and Methods

1. Ethics statement

The study protocol was approved by the institutional review board (Commission cantonale

[VAUD] d'éthique de la recherche sur l'être humain). The project was considered part of

research for improving diagnosis. All patients provided an informed oral consent which was

part of procedure for the admittance in the hospital. This oral consent was documented by the

fact that their chart did not mention that they ask their samples not to be used for research.

The institutional review board approved the protocol for oral consent and documentation. The

samples were treated anonymously.

2. Bronchoalveolar lavage fluid specimens

Fresh bronchoalveolar lavage fluids (BALFs) positive for P. jirovecii using Grocott’s

Methenamine Silver staining were supplemented with 15% v/v glycerol, frozen in liquid

Nitrogen, and stored at -80°C. Only those with a sufficient volume (more than one ml) and

heavy load were stored. Four specimens were stored between 2005 and 2008, and were used

for the selection procedure described here below.

3. Enrichment in P. jirovecii cells by immuno-precipitation

The BALF was centrifuged at 10’000 rpm for 10 min. After removal of the supernatant, the

pellet was resuspended in 200 µl of 1 X PBS, mixed with 50 µl of antibodies solution

(Pneumo-Cel IF, Celllabs, Australia), and incubated overnight at 4°C without shaking. The

solution was then added to 50 ml of Dynabeads® Protein G pretreated with 0.1 M Na acetate

pH 5-0.01% tween-20 and PBS pH 7.4 according to the manufacturer’s instructions

(Invitrogen, Switzerland), and the whole mixture was incubated overnight at 4°C. Beads were

2

washed three times with 1X PBS by concentration for two minutes using an immunomagnetic

separation rack (Invitrogen, Switzerland). P. jirovecii cells were eluted with 200 µl of elution

buffer (0.1 M citrate buffer pH 2), 100 µl of 1 M Tris pH 7.5 was added, and the enriched

BALF was stored at -20°C.

4. DNA extraction and amplification

Genomic DNA was extracted from BALF specimens using QIAamp® DNA Mini kit

(QIAGEN, Germany), and resuspended in 50 l of elution buffer. For each BALF, one l of

DNA was randomly amplified in a 20 l reaction using the Illustra GenomiPhi V2 DNA

Amplification Kit (GE Healthcare, Switzerland) according to the manufacturer’s instructions.

To obtain a sufficient amount of DNA for high throughput sequencing, each DNA sample was

amplified in ten separate 20 l reactions, that were pooled and purified using QIAamp® DNA

blood mini kit (Qiagen, Germany). Amplified DNA fragments (size ≥ 10 kb) were visualized

by ethidium bromide agarose gel electrophoresis. The quantity and purity were estimated

using Quant-IT TM DNA Assays (Invitrogen, Switzerland).

5. BALF specimens screening for high proportion of P. jirovecii DNA

The proportion of P. jirovecii DNA in the four BALFs was estimated by Roche 454 low level

pyrosequencing of amplified DNA (1/8 plate). The resulting reads were assigned to various

organisms using the following simplified bioinformatics classification pipeline. Shotgun 454

reads were mapped onto human genome using Roche’s gsMapper (Newbler v.2.6). Human

genomic DNA was identified exclusively by this method and was not further investigated.

Unmapped reads were filtered to discriminate P. jirovecii reads from those of other organisms

using an all-against-all blast comparison. Because the aim was only to estimate the proportion

of P. jirovecii DNA within each BALF, we considered reads as being from P. jirovecii only if

3

they exhibited significant homology with P. carinii (best blast hit, Blastx and Blastn, e-value

≤ 10-5). We did not take into account the reads that had their best blast hit against other fungi

proteomes. To ensure the specificity and sensitivity of blast searches, randomly simulated

reads were generated from Mycobacterium leprae, S. pombe, Kluyeromyces lactis, and human

adenovirus genomes using custom Perl scripts. They were used were as controls. The CG

odds ratios were not computed because of the small number of reads. The results are shown in

Table S1.

6. Investigation of the presence or absence of another Pneumocystis species

The data obtained from the preliminary 454 sequencings of the four BALFs (section A5) were

used to determine if they contained single or multiple species of Pneumocystis. Seven genes,

which sequences are available for several Pneumocystis species, S. pombe, and S. cerevisiae,

were used as markers. These genes were: the heat shock protein 70, dihydrofolate reductase,

dihydropteroate synthase, beta-tubulin, superoxide dismutase, cyclin-dependent kinase, and

guanosine nucleotide binding (see Table S6 for accession numbers). Profiles and Hidden

Markov Models were build using MAFFT [1] and Pftools [2], and used to screen the raw 454

reads. Spurious alignments, paralogs, and non-discriminative sequences (e.g. conserved

regions between human and Pneumocystis spp) were removed by manual inspection using

Jalview [3]. Pairewise identities were computed using custom Perl scripts. The results were

verified using a PCR which amplifies the mitochondrial large subunit 26S rRNA gene of all

Pneumocystis species, followed by sequencing the PCR product, as well as using PCRs

specific for P. carinii or P. wakefieldiae, another Pneumocystis species infecting specifically

rats [4].

4

7. Investigation of the presence or absence of another fungal species

To investigate if another fungal species than Pneumocystis was present in the four BALFs, the

454 reads were taxonomically classified using Blastx (e-value ≤ 10-50 against UniProtKB

database with filter "m S"), and MEGAN (min score 150) [5]. Reads that could be assigned at

the species level were collected as well as the complete proteome of the species against which

they had their hits. As Pneumocystis predicted peptides are not included into UniProtKB, the

P. carinii partial genome was re-annotated (ca. 4’591 predicted peptides, section A11), and to

a custom databases containing the complete proteomes of other relevant fungal species. The

assigned reads were then compared to this database using Blastx (e-value ≤ 10-6), and were

classified into three groups: group 1 included those with their best blast hit with P. carinii,

group 2 included those with best blast hit against with other fungal species but without

homolog in P. carinii, and group 3 included those having their best blast hit with other fungal

species but having also significant hit with P. carinii. Group 1 was considered as P. jirovecii

sequences homolog to sequences of P. carinii. Group 2 was not investigated further because it

was not possible to determine if these sequences were missing in P. carinii assembly, specific

to P. jirovecii, or truly belong to another fungus. Each sequence of group 3 was translated into

the six open reading frames using 6ft program [2], and aligned with its homologs from other

fungi using MAFFT [1]. Alignment were visualized using Jalview [3]. Pairewise sequence

identities were computed using custom Perl scripts. In all cases examined, the reads were

more close to P. carinii than to any other fungi, but still divergent from P. carinii (data not

shown). Thus, no evidence of the presence of another fungal species than P. jirovecii was

detected.

5

8. High throughput sequencing

Ten micrograms of amplified DNA from BALF E8 were used to build three Roche 454 XL+

shotgun libraries and a single Illumina HiSeq 2000 paired end library (insert size 500 bp).

Sequencing produced 2’889’665 Roche 454 single end reads (1.3 Gb, average length 700 nt)

and 316’713’248 Illumina paired end reads (30 Gb, average length 100 nt). Low quality and

adapters were removed from Illumina paired end reads using fastqc (v.0.9) and cutadapt [6].

9. Filtering and P. jirovecii genome assembly

The flow chart of the filtering and assembly procedure is shown in Figure S1 and the details

for each step are described below:

Step 1: P. jirovecii highly repetitive telomeric sequences and mitochondrion sequences were

identified using mreps [7] and Blastx (10-5) against Pneumocystis published sequences

[8,9,10], removed from 454 reads, and kept apart.

Step 2: The 454 reads were mapped onto the human genome (GRCh37/hg19) and the

complete NCBI human genomic resources

(ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/human_genomic.gz), using Roche’s gsMapper

(Newbler v. 2.6; parameters: –mi 85% -ml 100). Fully mapped reads were removed from the

dataset. Partially mapped reads were kept at this step only if they had their best blast hit

against P. carinii or another fungal proteomes.

Step 3: The reads were assembled using gsAssembler software with stringent parameters -mi

99 –ml 100 –rip, which were deduced from in silico simulations to avoid the creation of

chimeric contigs. Reads were trimmed internally with Newbler using -vs and -vt options. The

trimming database included 510 complete phage genomes

(http://phage.sdsu.edu/~rob/phage/), 3’074 complete viral genomes

(http://www.ncbi.nlm.nih.gov/genomes/VIRUSES/viruses.html), UniVec database

6

http://www.ncbi.nlm.nih.gov/genomes/VIRUSES/viruses.html

http://phage.sdsu.edu/~rob/phage/

ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/human_genomic.gz

(ftp://ftp.ncbi.nih.gov/pub/UniVec/), and the complete genomes of relevant bacteria. The

purpose of this assembly step was to facilitate the attribution of every contig to a taxonomic

group.

Step 4: Contigs and unassembled reads were simultaneously compared to the incomplete P.

carinii genome, including 4’278 contigs (http://pgp.cchmc.org/) and 1’042 ESTs [11], using

Blastn with relaxed parameters (reward 5, penalty -4, word size 7, cost to open a gap 10, cost

to extend a gap 6, e-value 10-6). They were also compared to custom protein databases

including P. carinii predicted peptides [12], fungi, eukaryotes, bacteria, archaea, and viral

proteomes using Blastx. Sequences having their best blast hit against P. carinii were

considered as P. jirovecii. Reads having a best blast hit against another eukaryotes except

Homo sapiens were considered as P. jirovecii. Most of them corresponded to conserved

sequences found in a broad spectrum of fungal species but missing or fragmented in the P.

carinii assembly.

Step 5: Residual contamination was assessed using RepeatMasker v. 3.2.8 with primspec and

rodspec options (http://repeatmasker.org/), and relative dinucleotide abundance analysis

[13,14]. To identify chimeric reads possibly introduced by the phi 29 DNA polymerase, pre-

filtered 454 reads were assembled using Roche’s gsAssembler (Newbler v. 2.6; parameters: -

mi 95 –ml 100 –rip). Single end 454 reads were then remapped to the contigs using Roche’s

gsMapper (Newbler, v. 2.6; -mi 98 -ml 100 –rip), and filtered using custom Perl scripts. The

whole process of stringent assembly (step 3) followed by filtration (steps 4 and 5) was

repeated several times

Step 6: A fragmented P. jirovecii genome assembly was obtained using Roche’s gs

Assembler (Newbler v. 2.6, -mi 98 –ml 100 –rip).

Step 7: The trimmed Illumina paired end reads that mapped onto human sequences using

Bowtie 2 [15] were removed.

7

http://repeatmasker.org/

http://pgp.cchmc.org/

Step 8: The remaining Illumina paired end reads that mapped onto a collection of bacterial

and viral genomes using Bowtie 2 [15] were removed.

Step 9: The Illumina reads were used to extend and scaffold the 454 contigs using SSAKE (v.

3-8, -p 1 -m 16 -o 2 -r 0.6 -p 0) [16]. The last remaining contaminants (bacteria essentially)

were filtered out as described in step 4. Extended 454 contigs, singlets, and unassembled 454

reads were assembled using Phrap to yield the final assembly [17,18].

10. Transcriptome assembly and annotation

Total RNA was isolated from a BALF of another patient with PCP than that used for genome

sequencing. Total RNA was extracted using the Ribopure Yeast Kit (Ambion). RNA was

randomly amplified using NuGEN Ovation® RNA-seq system and sequenced using Illumina

HiSeq 2000 technologies. A total of 246’196’388 non-strand-specific paired ended reads

(insert size 250 bp, average read length 100 nt) were obtained. Adapters sequences and low

quality sequences were removed using cutadapt [6]. Human reads (62.5%) were first removed

by mapping using TopHat [19]. Additional human contaminant reads were removed (31%) by

realignment using GSNAP [20]. Bacterial contaminants were removed by mapping to a

collection of bacterial species using Bowtie 2 (3.8%) [15]. After these filtering steps, we

obtained 6’708’547 paired end reads that were assembled using TopHat and cufflinks [19,21].

11. Gene predictions and functional annotations

The repeat-masked P. jirovecii genome was annotated using Maker (v. 2.10) [22], integrating

de novo, homology, and transcriptome evidences for genes. Augustus (v. 2.3.1) [23] and

SNAP (v. 2006-07-28) [24] were trained on 483 transcripts expected to be complete (section

A10). GeneMark-ES (v.2 .5) [25] was trained directly on the genome assembly. Homology

based evidence was obtained from the alignments of RNAseq transcripts, 43’310 proteins

8

corresponding to the complete proteome of Schizosaccharomyces pombe,

Schizosaccharomyces japonicus, Saccharomyces cerevisiae, Neosartorya fisherii, Neurospora

crassa, and 393 publicly available Pneumocystis proteins (downloaded from UniProtKB

release 2012_06). Protein coding genes were annotated by comparison to a collection of

fungal proteomes using Blastp with an e-value of 10-6 as cut off. Mapping to KEGG metabolic

pathways was performed using our previously mapping pipeline [12], and Priam [26].

Specific softwares were used to predict secreted proteins, GPI-anchor proteins, carbone

hydrate enzymes, peptidases, transmembrane proteins, kinases, G-proteins coupled receptors

(GPCRs), transporters GTPases, phosphatases, transfer RNAs, other non-coding RNAs, and

transposable elements. The ribosomal operon unit as well as the mitochondrial genome were

assembled and annotated separately (see sections 11 and 12.). For the sake of a fair

comparison with P. jirovecii, the P. carinii genome was re-annotated using the same methods

including specific P. carinii gene models as well as a collection of 1’042 [11] and 48’229

ESTs (C. Aliouat-Denis, unpublished data).

12. Phylogeny

The proteomes of Ashbya gossypii, Debaryomyces hansenii, N. crassa, Neosartorya fumigata,

S. japonicus, S. pombe, Ustilago maydis, S. cerevisiae, Rhizopus delemar, and Yarrowia

lipolytica were downloaded from UniprotKB website (release 2012_06;

http://www.uniprot.org/), those of Schizosaccharomyces cryophilus, Schizosaccharomyces

octosporus from the Broad Institute website (http://www.broadinstitute.org/). The proteomes

of P. jirovecii and P. carinii were from this study, whereas that of Taphrina deformans is

unpublished data (Cissé et al., Manuscript in preparation). Single copy orthologs were

identified using Orthologous MAtrix project (OMA.0.99) [27], concatenated, and aligned

using MAFFT [1] with the L-INS-i method. Misaligned regions were removed by GBLOCKS

9

[28]. The maximum likelihood and maximum parsimony phylogenies were inferred using

PhyML (v.3.0) [29] and RAxML (v.7.2.8) [30], respectively with 100 bootstrap replicates and

BLOSUM62 as model.

13. Search for missing genes

The glyoxylate cycle hallmark genes (i.e. isocitrate lyase and malate synthase), and enzymes

dedicated to the synthesis of amino acids were searched using hmmer3 (http://hmmer.org/)

with corresponding Pfam Hidden Markov models (http://pfam.sanger.ac.uk/), our previously

KEGG mapping pipeline [12], and Priam [26]. The secondary metabolites clusters were

searched using the Secondary Metabolite Unique Regions Finder (SMURF;

http://www.jcvi.org/smurf/index.php), and Blast searches against UniProtKB database

(http://www.uniprot.org/).

14. Recovery, assembly, and annotation of the ribosomal RNA unit

We searched for homologs to P. jirovecii ribosomal sequences in the raw Roche 454

sequences using Blastn (e-value ≤ of 10-5). The reference P. jirovecii sequences used for

screening were: 18S rRNA gene (NCBI accession number AB266392), ITS1-5.8S rRNA-

ITS2 (AF013954, AY330724, AY328067 - AY328078, AB469815, AB469816, EU709722 -

EU709727, AB469817, AB481404, AB481405 - AB481414, FJ164067, FJ164068), the full-

length P. carinii ribosomal operon (M86760), and the intron of P. jirovecii nuclear 26S rRNA

(L13615). The 242 reads recovered were purged for contaminants using Blast against NCBI

nr/nt (i.e. removal of human or bacterial ribosomal genes), assembled using Roche’s

gsAssembler (Newbler v. 2.6), and annotated using Artemis [31]. The contig 357 contained

the full-length 18S, ITS1, 5.8S, ITS2, and 26S rDNA sequences, whereas contig 358

contained the full-length IGS1, 5S, and IGS2 sequences.

10

http://www.uniprot.org/

http://pfam.sanger.ac.uk/

15. Recovery, assembly, and annotation of mitochondrial genome

The Roche 454 reads were compared to the published P. carinii mitochondrial genome [10]

using Blastn with the parameters -r 5 -q -4 -W 7 -G 10 -E 6 -e 1e-05 -v 1 -b 1 -F "m D". We

retrieved 111’276 reads (3.8% of total) that were purged for contaminants using Blast against

NCBI nr/nt, and assembled using gsAssembler (Newbler v.2.6, parameters -mi 95 -ml 100 -

ace –rip). Illumina paired end reads were used to correct contigs by remapping using Bowtie 2

[15] with default parameters and manual inspection. Protein coding genes and ribosomal

genes were annotated by comparison to the available P. jirovecii and P. carinii mitochondrial

genes using Artemis [31] and NCBI orf finder (http://www.ncbi.nlm.nih.gov/projects/gorf/)

with translation table 4. Transfer RNAs were predicted de novo using tRNAscan with –c

option [32].

B. Data access

The Whole Genome Shotgun project has been registered at EMBL-Bank under the 68827

identification number (http://www.ncbi.nlm.nih.gov/bioproject/68827). The raw sequences

were deposited at the European Sequence Read Archive (SRA) under accession number of

ERP000939. The P. jirovecii transcriptome project has been registered at EMBL-Bank under

the PRJEB400. Raw RNAseq data were deposited at the European Sequence Read Archive

(SRA) under accession number of ERP001479. The P. jirovecii and P. carinii annotated

genomes are temporary available before public release at:

http://myhits.isb-sib.ch/wwwtmp/weekly/Pneumocystis_jirovecii_genome.tar.gz

http://myhits.isb-sib.ch/wwwtmp/weekly/Pneumocystis_carinii_genome.tar.gz

11

http://myhits.isb-sib.ch/wwwtmp/weekly/Pneumocystis_jirovecii_genome.tar.gz

http://www.ncbi.nlm.nih.gov/projects/gorf/

C. References for supplementary information

1. Katoh K, Asimenos G, Toh H. 2009. Multiple alignment of DNA sequences with

MAFFT. Methods Mol Biol 537: 39-64.

2. Bucher P, Karplus K, Moeri N, Hofmann K. 1996. A flexible motif search technique

based on generalized profiles. Comput Chem 20: 3-23.

3. Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ. 2009. Jalview Version

2--a multiple sequence alignment editor and analysis workbench. Bioinformatics 25:

1189-1191.

4. Palmer RJ, Cushion MT, Wakefield AE. 1999. Discrimination of rat-derived

Pneumocystis carinii f. sp. Carinii and Pneumocystis carinii f. sp. Ratti using the

polymerase chain reaction. Mol Cell Probes 13: 147-155.

5. Huson DH, Auch AF, Qi J, Schuster SC. 2007. MEGAN analysis of metagenomic data.

Genome Res 17: 377-386.

6. Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing

reads. EMBnetjournal EMBnet.journal, North America

7. Kolpakov R, Bana G, Kucherov G. 2003. mreps: Efficient and flexible detection of

tandem repeats in DNA. Nucleic Acids Res 31: 3672-3678.

8. Underwood AP, Louis EJ, Borts RH, Stringer JR, Wakefield AE. 1996 Pneumocystis

carinii telomere repeats are composed of TTAGGG and the subtelomeric sequence

contains a gene encoding the major surface glycoprotein. Mol Microbiol 19: 273-281.

9. Kutty G, Ma L, Kovacs JA. 2001 Characterization of the expression site of the major

surface glycoprotein of human-derived Pneumocystis carinii. Mol Microbiol 42: 183-

193.

12

10. Sesterhenn TM, Slaven BE, Keely SP, Smulian AG, Lang BF, et al. 2010 Sequence

and structure of the linear mitochondrial genome of Pneumocystis carinii. Mol Genet

Genomics 283: 63-72.

11. Cushion MT, Smulian AG, Slaven BE, Sesterhenn T, Arnold J, et al. 2007.

Transcriptome of Pneumocystis carinii during fulminate infection: carbohydrate

metabolism and the concept of a compatible parasite. PLoS One 2: e423.

12. Hauser PM, Burdet FX, Cisse OH, Keller L, Taffe P, et al. 2010. Comparative

genomics suggests that the fungal pathogen pneumocystis is an obligate parasite

scavenging amino acids from its host's lungs. PLoS One 5: e15152.

13. Gentles AJ, Karlin S. 2001. Genome-scale compositional comparisons in eukaryotes.

Genome Res 11: 540-546.

14. Willner D, Thurber RV, Rohwer F. 2009. Metagenomic signatures of 86 microbial and

viral metagenomes. Environ Microbiol 11: 1752-1766.

15. Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat

Methods 9: 357-359.

16. Warren RL, Sutton GG, Jones SJ, Holt RA. 2007. Assembling millions of short DNA

sequences using SSAKE. Bioinformatics 23: 500-501.

17. Green P. 2009. Phrap, version 1.090518. http://phrap.org.

18. Green P, Ewing, B. 2002. Phred, version 0.020425c. http://phrap.org.

19. Trapnell C, Pachter L, Salzberg SL. 2009. TopHat: discovering splice junctions with

RNA-Seq. Bioinformatics 25: 1105-1111.

20. Wu TD, Nacu S. 2010. Fast and SNP-tolerant detection of complex variants and splicing

in short reads. Bioinformatics 26: 873-881.

21. Roberts A, Pimentel H, Trapnell C, Pachter L. 2011. Identification of novel transcripts

in annotated genomes using RNA-Seq. Bioinformatics 27: 2325-2329.

13

http://phrap.org/

http://phrap.org/

22. Cantarel BL, Korf I, Robb SM, Parra G, Ross E, et al. 2008. MAKER: an easy-to-use

annotation pipeline designed for emerging model organism genomes. Genome Res 18:

188-196.

23. Stanke M, Schoffmann O, Morgenstern B, Waack S. 2006. Gene prediction in

eukaryotes with a generalized hidden Markov model that uses hints from external

sources. BMC Bioinformatics 7: 62.

24. Korf I. 2004. Gene finding in novel genomes. BMC Bioinformatics 5: 59.

25. Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M. 2008. Gene

prediction in novel fungal genomes using an ab initio algorithm with unsupervised

training. Genome Res 18: 1979-1990.

26. Claudel-Renard C, Chevalet C, Faraut T, Kahn D. 2003. Enzyme-specific profiles for

genome annotation: PRIAM. Nucleic Acids Res 31: 6633-6639.

27. Roth AC, Gonnet GH, Dessimoz C. 2008. Algorithm of OMA for large-scale orthology

inference. BMC Bioinformatics 9: 518.

28. Talavera G, Castresana J. 2007. Improvement of phylogenies after removing divergent

and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56: 564-

577.

29. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, et al. 2010. New

algorithms and methods to estimate maximum-likelihood phylogenies: assessing the

performance of PhyML 3.0. Syst Biol 59: 307-321.

30. Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic

analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688-2690.

31. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, et al. 2000. Artemis: sequence

visualization and annotation. Bioinformatics 16: 944-945.

14

32. Lowe TM, Eddy SR. 1997. tRNAscan-SE: a program for improved detection of transfer

RNA genes in genomic sequence. Nucleic Acids Res 25: 955-964.

D. Supplementary figure

Figure S1. Bioinformatics strategy used for the filtration, classification, and assembly of

the reads from P. jirovecii. The whole DNA extracted from the bronchoalveolar fluid and

randomly amplified was sequenced using Roche 454 shotgun single end (SE) and Illumina

paired end (PE) sequencing technologies. The 454 reads were filtered and assembled into a

partial assembly (steps 1 to 6). In parallel, low quality Illumina reads and those mapped to

454 contaminant reads were eliminated (steps 7 and 8). After removal of contaminant reads,

Illumina reads were used to complete the 454 assembly into one final genome (step 9).

15

materials and methods · web viewests [11], using blastn with relaxed parameters (reward 5, penalty...

Documents