detection of viral pathogens with multiplex nanopore minion ...metagenomic sequencing with the...

17
Detection of viral pathogens with multiplex Nanopore MinION sequencing: be careful 1 with cross-talk 2 3 Yifei Xu 1,2 *, Kuiama Lewandowski 3 , Sheila Lumley 4,5 , Steven Pullan 3 , Richard Vipond 3 , 4 Miles Carroll 3 , Dona Foster 1,2 , Philippa C Matthews 4,5 , Timothy Peto 1,2 , Derrick Crook 1,2 5 6 1 Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom, 2 NIHR 7 Oxford Biomedical Research Centre, University of Oxford, United Kingdom, 3 Public Health 8 England, National Infection Service, Porton Down, Salisbury, United Kingdom, 4 Nuffield 9 Department of Medicine, Peter Medawar Building for Pathogen Research, University of 10 Oxford, Oxford, United Kingdom, 5 Department of Infectious Diseases and Microbiology, 11 Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, United 12 Kingdom 13 14 *Correspondence address. Yifei Xu, Nuffield Department of Medicine, University of Oxford, 15 John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom; Email: [email protected] 16 17 Abstract 18 Metagenomic sequencing with the Oxford Nanopore MinION sequencer offers potential 19 for point-of-care testing of infectious diseases in clinical settings. To improve cost- 20 effectiveness, multiplexing of several, barcoded samples upon a single flow cell will be 21 required during sequencing. We generated a unique sequencing dataset to assess the extent 22 and source of cross barcode contamination caused by multiplex MinION sequencing. 23 Sequencing libraries for three different viruses, including influenza A, dengue and 24 chikungunya, were prepared separately and sequenced on individual flow cells. We also 25 pooled the respective libraries and performed multiplex sequencing. We identified 0.056% of 26 . CC-BY-NC-ND 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted April 25, 2018. ; https://doi.org/10.1101/308262 doi: bioRxiv preprint

Upload: others

Post on 29-Jan-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Detection of viral pathogens with multiplex Nanopore MinION sequencing: be careful 1 with cross-talk 2

    3

    Yifei Xu1,2*, Kuiama Lewandowski3, Sheila Lumley4,5, Steven Pullan3, Richard Vipond3, 4

    Miles Carroll3, Dona Foster1,2, Philippa C Matthews4,5, Timothy Peto1,2, Derrick Crook1,2 5

    6 1 Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom, 2 NIHR 7

    Oxford Biomedical Research Centre, University of Oxford, United Kingdom, 3 Public Health 8

    England, National Infection Service, Porton Down, Salisbury, United Kingdom, 4 Nuffield 9

    Department of Medicine, Peter Medawar Building for Pathogen Research, University of 10

    Oxford, Oxford, United Kingdom, 5 Department of Infectious Diseases and Microbiology, 11

    Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, United 12

    Kingdom 13

    14

    *Correspondence address. Yifei Xu, Nuffield Department of Medicine, University of Oxford, 15

    John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom; Email: [email protected] 16

    17

    Abstract 18

    Metagenomic sequencing with the Oxford Nanopore MinION sequencer offers potential 19

    for point-of-care testing of infectious diseases in clinical settings. To improve cost-20

    effectiveness, multiplexing of several, barcoded samples upon a single flow cell will be 21

    required during sequencing. We generated a unique sequencing dataset to assess the extent 22

    and source of cross barcode contamination caused by multiplex MinION sequencing. 23

    Sequencing libraries for three different viruses, including influenza A, dengue and 24

    chikungunya, were prepared separately and sequenced on individual flow cells. We also 25

    pooled the respective libraries and performed multiplex sequencing. We identified 0.056% of 26

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • total reads in the multiplex sequencing data that were assigned to incorrect barcodes. 27

    Chimeric reads were the predominant source of this error. Our findings highlight the need for 28

    careful filtering of multiplex sequencing data before downstream analysis, and the trade-off 29

    between sensitivity and specificity that applies to the barcode demultiplexing methods. 30

    31

    Keywords 32

    Nanopore sequencing; metagenomics; multiplexing; cross barcode contamination; chimera 33

    34

    Background 35

    Metagenomic sequencing has the potential to allow unbiased identification of pathogens 36

    from a clinical sample. It holds the promise to serve as a single and universal assay for 37

    diagnostics of infectious diseases directly from samples without the need for a priori 38

    knowledge [1–3]. In addition to identification of pathogen species, broad and deep 39

    metagenomic sequence data could provide information relevant to determining treatment and 40

    prognosis, detecting outbreaks and tracking infection epidemiology [4–7]. Next-generation 41

    sequencing (NGS) platforms can produce a massive throughput of data at a modest cost, 42

    however, its application in clinical diagnostic and public health has been limited by 43

    complexity, slowness, and capital investment. 44

    The MinION is a palm-size, real-time, single-molecule genome sequencer developed by 45

    Oxford Nanopore Technologies (ONT). The MinION’s compact size and real-time nature 46

    could facilitate the application of metagenomic sequencing in point-of-care testing for 47

    infectious diseases, as demonstrated by several proof-of-concept studies, including 48

    identification of Chikungunya (CHIKV), Ebola (EBOV), and hepatitis C virus (HCV) from 49

    human clinical blood samples without target enrichment [8], and detection of bacterial 50

    pathogens from urine samples [9] and respiratory samples, without the need for prior culture 51

    [10]. 52

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • The data throughput of MinION has greatly increased since its release in 2015, with each 53

    consumable flow cell now generating between 10 and 20 Gb of DNA sequence data. This 54

    allows users to make more efficient use of the flow cell (and reduce cost) by multiplexing 55

    several samples in a single sequencing run. ONT has developed PCR-free barcode sets that 56

    allow multiplexing of up to 12 samples. 57

    Detection of influenza A virus in multiple respiratory samples could be one diagnostic 58

    use of a multiplexed MinION sequencing assay. However, when sequencing directly from 59

    samples with a potential wide range of viral titres, it is important to be aware of the potential 60

    for cross sample contamination, both during library preparation and the bioinformatic 61

    barcode demultiplexing stage following sequencing. Here, we present a unique MinION 62

    sequencing dataset and results of investigation into the extent and source of cross barcode 63

    contamination in multiplex sequencing. 64

    65

    Data Description 66

    We used a ferret nasal wash sample infected with influenza A virus as an exemplar and 67

    also spiked two aliquots of negative nasal wash samples from uninfected ferret (pre-existing 68

    unused stocks from an unrelated study) with dengue and chikungunya viruses separately. 69

    Neither of these viruses are relevant for clinical diagnostics in respiratory samples, but act 70

    here as clear, distinct markers for the assessment of cross sample contamination. The 71

    sequencing libraries for each sample were prepared in parallel, along with a negative nasal 72

    wash control, barcoded, and sequenced individually. We then pooled an aliquot of the 73

    sequencing libraries and performed multiplex MinION sequencing. Reads from the four 74

    individual runs and the multiplex run were then analyzed to investigate the extent and source 75

    of cross sample contamination. 76

    Sample preparation 77

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • RNA was extracted, using the QIAamp viral RNA kit (Qiagen) according to the 78

    manufacturer’s instructions, from ferret nasal wash containing influenza A (H1N1) virus 79

    (A/California/04/2009) and a pool of negative nasal wash samples. Aliquots of negative 80

    sample extract were spiked with either dengue (DENV) (strain TC861HA, 81

    Genbank:MF576311) or CHIKV (strain S27, Genbank:MF580946.1) viral RNA from The 82

    National collection of Pathogenic Viruses (www.phe-culturecollections.org.uk). Samples 83

    were DNase treated using TURBO DNase (ThermoFisher Scientific) and purified using the 84

    RNA Clean & Concentrator™-5 kit (Zymo Research). cDNA was prepared and amplified 85

    using a Sequence-Independent-Single-Primer-Amplification method [8] modified as 86

    described previously [11]. Amplified cDNA was quantified using the Qubit dsDNA HS 87

    Assay Kit (ThermoFisher Scientific), and 1µg was used as input for each MinION library 88

    preparation, with the exception of the negative control where the entire sample was used. 89

    MinION library preparation and sequencing 90

    Ligation Sequencing Kit 1D (SQK-LSK108) and Native Barcoding Kit 1D (EXP-91

    NBD103) were used according to the ONT standard protocols, with the exception that only 92

    one barcode was included in each of the four library preparations. Each library was run on an 93

    individual flow cell and a fifth pooled library was made by combining the four individually 94

    barcoded libraries. Libraries were sequenced on R9.4 flow cells. The study design is shown 95

    in Fig. 1. 96

    Genomics analysis 97

    Reads were basecalled using Albacore v2.1.7 (ONT) with barcode demultiplexing. Reads 98

    from each sequencing run were mapped to genomic sequences of each virus using Minimap2 99

    [12]. The number of reads mapped to reference was counted using Pysam 100

    (https://github.com/pysam-developers/pysam). De novo assembly was performed using Canu 101

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    http://www.phe-culturecollections.org.uk/https://github.com/pysam-developers/pysamhttps://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • v1.7 [13], and the resulting draft genome was polished using Nanopolish [14] with the signal-102

    level data. 103

    The barcode scores for each read were extracted from the “sequencing_summary.txt” file 104

    generated by Albacore. Two rounds of barcode demultiplexing were performed for the 105

    multiplex MinION sequencing data using Porechop (v0.2.2, 106

    https://github.com/rrwick/Porechop). Albacore’s output directory was used as input. First, we 107

    choose a threshold of 75 for “--middle_threshold” to examine internal adapter and filter 108

    chimera candidates. Next, we used the “--require_two_barcodes” option and set the threshold 109

    for barcode score as 70 to conduct stringent barcode binning. Read current signals were 110

    extracted from FAST5 file using ONT fast5 API 111

    (https://github.com/nanoporetech/ont_fast5_api) and plotted by using ggplot2 implemented in 112

    R (https://www.r-project.org/) for a comparison of chimeric and non-chimeric reads. 113

    114

    Results 115

    MinION sequencing data and assembly of viral genomes 116

    The throughput of each MinION sequencing run varied due to differences in running 117

    time. A maximum number of ~2.4M reads was achieved by the multiplexed sequencing run 118

    and the individual CHIKV run, due to longer running times (Table S1). Reads from the 119

    spiked virus accounted for 96% of the data in the individual CHIKV and DENV sequencing 120

    runs, and 78% for the flu sample (Table 1). The percentage of viral reads within each 121

    barcoded sample in the multiplex sequencing data is close to that in the individually run 122

    sample data (Table 2). Each viral genome had an ultra-high mean depth of coverage in the 123

    individual and multiplex sequencing data, and de novo assembly was able to recover nearly 124

    complete genomes for all three viruses with 99.9% identities compared to the GenBank 125

    reference. 126

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://github.com/rrwick/Porechophttps://github.com/nanoporetech/ont_fast5_apihttps://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Extent and source of cross-sample contamination 127

    Each sample was barcoded, and sequenced both individually and multiplexed, which 128

    allowed us to examine the performance of barcode demultiplexing of Albacore. In the 129

    individually sequenced sample data we would expect only a single native barcode to be 130

    present. For CHIKV (barcode NB01), DENV (NB09) and FLU-A (NB10) individual 131

    sequencing runs, we found that 86, 109, and 17 reads respectively were assigned to barcode 132

    bins not expected to be present in the library (representing 0.0036%, 0.0129%, and 0.001% of 133

    total reads). In the multiplex sequencing data, 41 reads (0.0016%) were assigned to barcodes 134

    not included in the experiments (i.e. a barcode other than NB01, NB05, NB09 or NB10). We 135

    defined these as mis-assigned reads (Figure 2a). 136

    To examine potential laboratory contamination in sequencing library preparation, we 137

    mapped all reads from each individual run against the genomic sequences of all three viruses. 138

    No read was found to originate from a genome prepared in a different library, suggesting no 139

    in vitro contamination. The multiplex sequencing library was prepared by pooling the 140

    individual, non-contaminated libraries after the ligation of both barcode and adapter. 141

    However, mapping results show 1311 (0.0543%) reads mapped to the incorrect target 142

    genome, implying that they were cross-assigned to the wrong barcode bins (later referred to 143

    as ‘cross-assigned reads’), despite the fact that the multiplexed sequencing library was pooled 144

    with individual libraries showed no cross-assigned reads at all. We hypothesized that mis-145

    assigned and cross-assigned reads were due to a low barcode score, and investigated the 146

    barcode scores of these reads. Most of the mis-assigned reads had a barcode score

  • of cross-assigned reads could be aligned to more than one viral genome or to distinct regions 152

    within the same genome, suggesting they are chimeras. 153

    To confirm this observation, we investigated the raw current signals of a few cross-154

    assigned reads compared to those of correctly assigned reads (Fig. 2c). The current signals of 155

    a correctly assigned read usually include: (i) an open pore signal of high current representing 156

    the time that the sequencing pore changes from one adapter to another, (ii) a stall signal, 157

    referring to the period of time that a DNA sequence is in the pore but yet to move, and (iii) 158

    the signal trace of DNA sequencing. In contrast, a chimeric read possesses a stall signal and a 159

    huge spike signal in the middle of the read. Chimeric reads can possess two different barcode 160

    sequences at the start and end, thus confusing assignment of a barcode bin. In addition, we 161

    also observed that a small number of reads possessed a 24 nucleotide sequence that is 162

    identical to a barcode sequence that is not expected to be present in the library. Taken 163

    together, these data demonstrate three categories of error that contribute to cross sample 164

    contamination in our dataset: (i) chimeric reads (account for ~80% of all cross-assigned 165

    reads); (ii) reads with low barcode score (account for

  • The ultimate objective of our research is to develop a nanopore metagenomic sequencing 177

    based diagnostic assay that enables point-of-care testing for infectious diseases. Multiplex 178

    sequencing offers the opportunity to improve scalability and cut cost, however, cross sample 179

    contamination can lead to errors in the data and false interpretation of the results. Cross 180

    sample contamination has been observed before [15], but it has not previously been possible 181

    to separate library preparation from sequencing as the cause of chimeric reads. In this 182

    experiment, we pooled clean libraries and performed multiplex MinION sequencing in order 183

    to investigate the extent and source of cross-barcode contamination. We identified 0.056% of 184

    total reads were cross-assigned to the incorrect barcode bins, which is comparable to those 185

    reported for Illumina sequencing platforms from different studies (between 0.06% and 186

    0.25%) [16–18]. Our results showed that chimeric reads are the predominant source of cross-187

    barcode assignment errors. Chimeric reads in this dataset could only have been formed during 188

    sequencing rather than library preparation, as they were completely absent in the sequencing 189

    data of individual libraries, and the only further processing step was to mix the final 190

    sequencing libraries prior to loading. We hypothesize that the current algorithm implemented 191

    in Albacore cannot recognize the short dissociation between DNA sequences that run 192

    concurrently through the nanopore, thereby concatenating more than one sequence into the 193

    same Fast5 file. 194

    In summary, our study demonstrated that chimeric reads are the predominant source of 195

    cross barcode assignment errors in multiplex MinION sequencing. It highlights the need for 196

    careful filtering of multiplex MinION sequencing data before downstream analysis, and the 197

    trade-off between sensitivity and specificity that applies to the barcode demultiplexing 198

    methods. 199

    200

    Competing interests 201

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • The authors declare that they have no competing interests. 202

    203

    Funding 204

    This work was supported by NIHR Oxford Biomedical Research Centre. 205

    206

    Author contributions 207

    All authors designed the study. S.P., K. L., S. L., and Y.X. conducted MinION sequencing. 208

    Y.X. analyzed data. All authors participated in interpreting the results and writing the 209

    manuscript. All authors read and approved the final version of this manuscript. 210

    211

    Acknowledgments 212

    The authors would like to thank Dr. Anthony Marriott (Public Health England) for providing 213

    ferret nasal aspirates. 214

    215

    References 216

    1. Miller RR, Montoya V, Gardy JL, Patrick DM, Tang P. Metagenomics for pathogen 217 detection in public health. Genome Med. BioMed Central; 2013;5:81. 218

    2. Bibby K. Metagenomic identification of viral pathogens. Trends Biotechnol. Elsevier; 219 2013;31:275–9. 220

    3. Schlaberg R, Chiu CY, Miller S, Procop GW, Weinstock G, Microbiology PPC and C on 221 LP of the AS for, et al. Validation of metagenomic next-generation sequencing tests for 222 universal pathogen detection. Arch Pathol Lab Med. the College of American Pathologists; 223 2017;141:776–86. 224

    4. Loman NJ, Constantinidou C, Christner M, Rohde H, Chan JZ-M, Quick J, et al. A culture-225 independent sequence-based metagenomics approach to the investigation of an outbreak of 226 Shiga-toxigenic Escherichia coli O104: H4. Jama. American Medical Association; 227 2013;309:1502–10. 228

    5. Qin J, Li Y, Cai Z, Li S, Zhu J, Zhang F, et al. A metagenome-wide association study of 229 gut microbiota in type 2 diabetes. Nature. Nature Publishing Group; 2012;490:55. 230

    6. Greninger AL, Chen EC, Sittler T, Scheinerman A, Roubinian N, Yu G, et al. A 231 metagenomic analysis of pandemic influenza A (2009 H1N1) infection in patients from North 232 America. PLoS One. Public Library of Science; 2010;5:e13381. 233

    7. Yang J, Yang F, Ren L, Xiong Z, Wu Z, Dong J, et al. Unbiased parallel detection of viral 234 pathogens in clinical samples by use of a metagenomic approach. J Clin Microbiol. Am Soc 235

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Microbiol; 2011;49:3463–9. 236

    8. Greninger AL, Naccache SN, Federman S, Yu G, Mbala P, Bres V, et al. Rapid 237 metagenomic identification of viral pathogens in clinical samples by real-time nanopore 238 sequencing analysis. Genome Med. BioMed Central; 2015;7:99. 239

    9. Schmidt K, Mwaigwisya S, Crossman LC, Doumith M, Munroe D, Pires C, et al. 240 Identification of bacterial pathogens and antimicrobial resistance directly from clinical urines 241 by nanopore-based metagenomic sequencing. J Antimicrob Chemother. British Society for 242 Antimicrobial Chemotherapy; 2016;72:104–14. 243

    10. Pendleton KM, Erb-Downward JR, Bao Y, Branton WR, Falkowski NR, Newton DW, et 244 al. Rapid pathogen identification in bacterial pneumonia using real-time metagenomics. Am J 245 Respir Crit Care Med. Am Thoracic Soc; 2017;196:1610–2. 246

    11. Atkinson B, Graham V, Miles RW, Lewandowski K, Dowall SD, Pullan ST, et al. 247 Complete genome sequence of Zika virus isolated from semen. Genome Announc. Am Soc 248 Microbiol; 2016;4:e01116-16. 249

    12. Li H. Minimap2: versatile pairwise alignment for nucleotide sequences. arXiv. 2017;1708. 250

    13. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable 251 and accurate long-read assembly via adaptive k-mer weighting and repeat separation. 252 Genome Res. Cold Spring Harbor Lab; 2017;27:722–36. 253

    14. Mongan AE, Yusuf I, Wahid I, Hatta M. The evaluation on molecular techniques of 254 reverse transcription loop-mediated isothermal amplification (RT-LAMP), reverse 255 transcription polymerase chain reaction (RT-PCR), and their diagnostic results on MinION 256 TM nanopore sequencer for the detection of d. Am J Microbiol Res. 2015;3:118–24. 257

    15. White R, Pellefigues C, Ronchese F, Lamiable O, Eccles D. Investigation of chimeric 258 reads using the MinION. F1000Research. Faculty of 1000 Ltd; 2017;6. 259

    16. Nelson MC, Morrison HG, Benjamino J, Grim SL, Graf J. Analysis, optimization and 260 verification of Illumina-generated 16S rRNA gene amplicon surveys. PLoS One. Public 261 Library of Science; 2014;9:e94249. 262

    17. Wright ES, Vetsigian KH. Quality filtering of Illumina index reads mitigates sample 263 cross-talk. BMC Genomics. BioMed Central; 2016;17:876. 264

    18. D’Amore R, Ijaz UZ, Schirmer M, Kenny JG, Gregory R, Darby AC, et al. A 265 comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA 266 community profiling. BMC Genomics. BioMed Central; 2016;17:55. 267

    268

    269

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Figures 270

    Figure 1 Overview of study design. RNA was extracted from four samples, including a ferret 271 nasal wash sample infected with influenza A virus, two negative ferret nasal wash samples 272 spiked with dengue and chikungunya viruses, and a negative ferret nasal wash control. cDNA 273 was prepared and amplified using a Sequence-Independent-Single-Primer-Amplification 274 method. The sequencing libraries for each sample were prepared in parallel, barcoded, and 275 sequenced on individual flow cells. Multiplex sequencing was also performed by pooling the 276 four individual libraries. Reads from the four individual runs and the multiplex run were 277 analysed to assess the extent and source of cross barcode contamination in multiplex 278 sequencing. 279

    280

    281 282

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Figure 2 a) summary of number and percentage of reads correctly assigned, unassigned, mis-283 assigned, and cross-assigned in each sequencing run. Un-assigned refers to reads that cannot 284 be assigned to any bins by Albacore due to a barcode score less than 60, mis-assigned refers 285 to reads that were assigned to barcode bins not included in this experiment, and cross-286 assigned refers to reads that were assigned to the incorrect barcode bins; b) distribution of 287 barcode scores reported by Albacore for mis-assigned reads and cross-assigned reads in the 288 multiplex sequencing data; c) comparison of raw signal of a chimeric and a correctly assigned 289 read. The signal of chimeric read possesses a stall signal and a huge spike signal in the 290 middle of the read. 291

    2 a) 292

    293

    294

    2 b) 295

    296

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • 2 c) 297

    298

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Figure 3 Removal of cross-assigned reads and loss of total sequencing data by two filtering 299 approaches using Porechop. The first approach examines internal adapter and filter chimera 300 candidates that possess adapter in the middle of the read. The second approach requires two 301 barcodes at the start and end of each read. Increasing stringency for barcode demultiplexing 302 occurs at the expense of reducing total number of reads. 303

    304

    305

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Tables 306

    Table 1 Summary of mapping and de novo assembly results for data from MinION sequencing of individual libraries. The eight gene segments 307 of influenza A virus are PB2, polymerase basic 2; PB1, polymerase basic 1; PA, polymerase acidic; HA, hemagglutinin; NP, nucleoprotein; NA, 308 neuraminidase; M, matrix; NS, non-structural. 309

    310

    Mapping

    de novo assembly

    Segment Reads mapped Genome coverage

    Mean depth of coverage

    Genome coverage Mismatch/Gap/Genome length

    Chikungunya 96% 100% 160,000

    99% 2/4/11774

    Dengue 96% 100% 75,000

    99% 4/6/10709

    Influenza A

    PB2 18% 100% 125,000

    100% 0/0/2280

    PB1 12% 100% 87,000

    95% 0/1/2274

    PA 11% 100% 84,000

    100% 0/0/2151

    HA 16% 100% 140,000

    100% 1/6/1701

    NP 11% 100% 114,000

    99% 0/2/1497

    NA 5% 100% 51,000

    100% 0/1/1410

    M 1% 100% 15,000

    100% 0/1/972

    NS 1% 100% 17,000

    100% 0/1/838

    311

    312

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Table 2 Summary of mapping and de novo assembly results for data from multiplex MinION sequencing. The eight gene segments of influenza 313 A virus are PB2, polymerase basic 2; PB1, polymerase basic 1; PA, polymerase acidic; HA, hemagglutinin; NP, nucleoprotein; NA, 314 neuraminidase; M, matrix; NS, non-structural. 315

    316

    Mapping

    de novo assembly

    Segment Reads mapped Genome coverage

    Mean depth of coverage

    Genome coverage Mismatch/Gap/Genome length

    Chikungunya 94% 100% 42,000

    99% 1/21/11774

    Dengue 95% 100% 51,000

    99% 2/8/10709

    Flu

    PB2 18% 100% 61,000

    97% 2/4/2280

    PB1 12% 100% 43,000

    100% 0/3/2274

    PA 11% 100% 41,000

    100% 0/0/2151

    HA 15% 100% 68,000

    100% 1/2/1701

    NP 10% 100% 55,000

    100% 2/4/1497

    NA 5% 100% 26,000

    100% 0/1/1410

    M 1% 100% 8,000

    99% 0/0/972

    NS 1% 100% 9,000

    100% 0/1/838

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Supplementary table 317

    Table S1 Summary of statistics of MinION sequencing data. 318

    319

    Virus Barcode SISPA concentration (ng/µl) Number of reads Total length (bp) Individual sequencing chikungunya NB01 88.6 2,350,867 2,319,958,089

    dengue NB09 70.2 843,468 965,165,841

    influenza A virus NB10 49.4 1,609,780 1,749,616,221

    negative control NB05 0.712 3,113 3,475,147

    Mutliplex sequencing chikungunya NB01 - 639,986 632,952,453

    dengue NB09 - 580,183 666,239,544

    influenza A virus NB10 - 823,427 881,448,974

    negative control NB05 - 28,682 29,967,359

    unclassified

    340,895 360,996,367

    320

    .CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

    The copyright holder for this preprint (which wasthis version posted April 25, 2018. ; https://doi.org/10.1101/308262doi: bioRxiv preprint

    https://doi.org/10.1101/308262http://creativecommons.org/licenses/by-nc-nd/4.0/