nanopore sequencing - recap - analysis of next-generation

23
Nanopore sequencing - recap Analysis of Next-Generation Sequencing Data Friederike Dündar Applied Bioinformatics Core Slides at https://bit.ly/2CUdS9z 1 April 2, 2019 1 https://physiology.med.cornell.edu/faculty/skrabanek/lab/angsd/schedule_2018/ F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 1 / 23

Upload: others

Post on 16-Jan-2022

4 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Nanopore sequencing - recap - Analysis of Next-Generation

Nanopore sequencing - recapAnalysis of Next-Generation Sequencing Data

Friederike Dündar

Applied Bioinformatics Core

Slides at https://bit.ly/2CUdS9z1

April 2, 2019

1https://physiology.med.cornell.edu/faculty/skrabanek/lab/angsd/schedule_2018/F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 1 / 23

Page 2: Nanopore sequencing - recap - Analysis of Next-Generation

1 Analysis

2 References

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 2 / 23

Page 3: Nanopore sequencing - recap - Analysis of Next-Generation

Nanopore sequencing components

General setup:

lipid bilayer or polymermembranesalt solutionexternal voltage sourceproteins that form pores

I for ions that will follow theexternally created current

I for single strands of DNApass through

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 3 / 23

Page 4: Nanopore sequencing - recap - Analysis of Next-Generation

Nanopore sequencing components

The nanopore= biosensor and the only passageway for the exchange between the ionicsolution on two sides of a membrane

ionic conductivity through thenarrowest region of thenanopore is particularlysensitive to the presence of anucleobase’s mass and itsassociated electrical fielddifferent bases will invokedifferent changes in the ioniccurrent levels that pass throughthe pore

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 4 / 23

Page 5: Nanopore sequencing - recap - Analysis of Next-Generation

Nanopore sequencing components

The ratchet enzyme (“motor enzyme”) ensures:

unidirectional andsingle-nucleotide displacementat a slow pace so that thesignal can actually be registeredis typically an enzyme thatprocesses single-nucleotides inreal life, e.g. polymerases,exonucleases etc. – with aninhibited catalytic center!

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 5 / 23

Page 6: Nanopore sequencing - recap - Analysis of Next-Generation

DNA prep for nanopore-based sequencing

1 Fragmentation (mostly to achieve uniformity in the fragment sizedistributions)

2 Adapter ligation at both endslead adapter: loading of the “motorprotein” at the 5’ endtrailing adapter: facilitates strandcapture by concentrating DNAsubstrates at the membrane surfaceproximal to the nanoporehairpin adapter: permits contiguoussequencing of both strands;covalently connects both strands sothat the second strand is not lostwhile the first is being passedthrough the pore

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 6 / 23

Page 7: Nanopore sequencing - recap - Analysis of Next-Generation

Analysis

Analysis

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 7 / 23

Page 8: Nanopore sequencing - recap - Analysis of Next-Generation

Analysis

MinKNOW

= software that was used to run the MinION device, provided by ONT

several core tasks:Data acquisitionReal-time analysis andfeedbackData streamingDevice control, including runparameter selection - Sampleidentification and trackingEnsuring chemistry isperforming correctly

Once a read is completed, its information is stored in a fast5 file, acustomized file format based on .hdf5

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 8 / 23

Page 9: Nanopore sequencing - recap - Analysis of Next-Generation

Analysis

Fast5hierarchical format: folder like structures inside a single file

groups: metadatadatasets: actual dataattributes: metadata

For more info see https://bit.ly/2I2fLEgand https://bit.ly/2OCnDOd

$ h5ls -r ~/Data/read_data/r9_2d_zika_ch1_read10_pre.fast5

/Analyses Group/Analyses/EventDetection_000 Group/Analyses/EventDetection_000/Configuration Group/Analyses/EventDetection_000/Configuration/abasic_detection Group/Analyses/EventDetection_000/Configuration/event_detection Group/Analyses/EventDetection_000/Configuration/hairpin_detection Group/Analyses/EventDetection_000/Reads Group/Analyses/EventDetection_000/Reads/Read_10 Group/Analyses/EventDetection_000/Reads/Read_10/Events Dataset {2176/Inf}/Raw Group/Raw/Reads Group/Raw/Reads/Read_10 Group/Raw/Reads/Read_10/Signal Dataset {50722/Inf}/Sequences Group/Sequences/Meta Group

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 9 / 23

Page 10: Nanopore sequencing - recap - Analysis of Next-Generation

Analysis

From Fast5 to FASTQ: base calling

Base calling for nanopore-based sequencing = turning the electrical signal overtime (“squiggle”) into distinct base calls.This task is currently achieved by neuronal-network-based tools (used to beHidden-Markov-Model-based).

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 10 / 23

Page 11: Nanopore sequencing - recap - Analysis of Next-Generation

Analysis

From Fast5 to FASTQ: base calling

There are currently (March 2019)three base calling options providedby ONT:

MinKNOW (uses someproduction version of whateverONT deems the standard tool)Albacore (discontinued, butused to be the standard)Guppy

There are numerous open sourcebase callers, too. It’s not a settledissue, so it may make sense to hangon to the Fast5 file with the actualraw signal for now.

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 11 / 23

Page 12: Nanopore sequencing - recap - Analysis of Next-Generation

Analysis

From Fast5 to FASTQ:running Guppy$ ont-guppy-cpu/bin/guppy_basecaller --flowcell FLO-MIN106 --kit SQK-RAD004 -r-i fast5/ -s guppy_out # will generate numerous FASTQ files + log + *txt$ head -n 2 guppy_out/sequencing_summary.txt

Name Value

filename FAK59098_2e80324c914cfe667088fd5f8402410afdbc3251_17.fast5read_id cfd87084-71a2-4b66-ad97-ee9a21059ad7run_id 2e80324c914cfe667088fd5f8402410afdbc3251channel 57start_time 4829.079102duration 2.070500num_events 2070passes_filtering TRUEtemplate_start 4829.105957num_events_template 2043template_duration 2.043500seq_length_template 761mean_qscore_template 12.586006strand_score_template 0.000000median_template 84.322838mad_template 9.542708

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 12 / 23

Page 13: Nanopore sequencing - recap - Analysis of Next-Generation

Analysis

QC of base calls

There are numerous tools out there, e.g. MinIONQC or NanoPack [Lanfearet al., 2019, De Coster et al., 2018].Most make use of the sequencing_summary.txt file.

Typical assessments:distribution of read lengthsdistribution of quality scores

I over bp per readI over time across all reads

no. of reads per hourphysical flowcell maps

We ran FastQC, MinIONQC andNanoPack for demo purposes. Resultscan be found on the class website.

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 13 / 23

Page 14: Nanopore sequencing - recap - Analysis of Next-Generation

Analysis

QC of base calls

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 14 / 23

Page 15: Nanopore sequencing - recap - Analysis of Next-Generation

Analysis

QC of base calls

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 15 / 23

Page 16: Nanopore sequencing - recap - Analysis of Next-Generation

Analysis

Alignment

Typical short-read aligners are currently not recommended for ONTdata!

reads are longer than typical Illumina reads and of variable lengthhigher error ratesoften containing adaptersdifferent meaning/calculation of the quality scores

See NanoFilt for filtering recommendations [De Coster et al., 2018].

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 16 / 23

Page 17: Nanopore sequencing - recap - Analysis of Next-Generation

Analysis

Alignment with minimap2

## prepare, i.e. concatenate all individual FASTQ files into one$ mkdir alignment$ cd alignment$ cat ../guppy_out/*fastq > ont_angsd_run.fq

## download pre-compiled binariescurl -L https://github.com/lh3/minimap2/releases/download/v2.16/minimap2-2.16_x64-linux.tar.bz2 |

tar -jxvf -

## building the index$ ./minimap2-2.16_x64-linux/minimap2 -d lambda_3.6kb.mmi lambda_3.6kb.fasta

## perform the alignment$./minimap2-2.16_x64-linux/minimap2 -ax map-ont \

lambda_3.6kb.fasta ont_angsd_run.fq > lambda_seqs.sam

## bam file wrestlingspack load /qr4zqdd # samtoolssamtools view -h lambda_seqs.sam -b -o lambda_seqs.bamsamtools sort -o lambda_seqs.sort.bam -O bam lambda_seqs.bamsamtools index lambda_seqs.sort.bam

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 17 / 23

Page 18: Nanopore sequencing - recap - Analysis of Next-Generation

Analysis

Manually inspecting genome-wide files

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 18 / 23

Page 19: Nanopore sequencing - recap - Analysis of Next-Generation

References

References

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 19 / 23

Page 20: Nanopore sequencing - recap - Analysis of Next-Generation

References

[Deamer et al., 2016, De Coster et al., 2018, Eisenstein, 2017, Jain et al.,2016, Lanfear et al., 2019, Li, 2018]

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 20 / 23

Page 21: Nanopore sequencing - recap - Analysis of Next-Generation

References

References

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 21 / 23

Page 22: Nanopore sequencing - recap - Analysis of Next-Generation

References

Wouter De Coster, Svenn D’Hert, Darrin T. Schultz, Marc Cruts, andChristine Van Broeckhoven. NanoPack: Visualizing and processinglong-read sequencing data. Bioinformatics, 2018. doi:10.1093/bioinformatics/bty149.

David Deamer, Mark Akeson, and Daniel Branton. Three decades ofnanopore sequencing. Nature Biotechnology, 2016. doi:10.1038/nbt.3423.

Michael Eisenstein. An ace in the hole for DNA sequencing. Nature, 2017.doi: 10.1038/550285a.

Miten Jain, Hugh E. Olsen, Benedict Paten, and Mark Akeson. The OxfordNanopore MinION: Delivery of nanopore sequencing to the genomicscommunity. Genome Biology, 2016. doi: 10.1186/s13059-016-1103-0l.

R. Lanfear, M. Schalamun, D. Kainer, W. Wang, and B. Schwessinger.MinIONQC: fast and simple quality control for MinION sequencing data.Bioinformatics (Oxford, England), 2019. doi:10.1093/bioinformatics/bty654.

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 22 / 23

Page 23: Nanopore sequencing - recap - Analysis of Next-Generation

References

Heng Li. Minimap2: fast pairwise alignment for long DNA sequences.Bioinformatics, 34(18):3094–3100, 2018. doi:0.1093/bioinformatics/bty191.

F. Dündar (ABC, WCM) Nanopore sequencing - recap April 2, 2019 23 / 23