towards isoform resolution single-cell transcriptomics for

1
For Research Use Only. Not for use in diagnostic procedures. © Copyright 2021 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks of Advanced Analytical Technologies. All other trademarks are the sole property of their respective owners. Towards Isoform Resolution Single-Cell Transcriptomics for Clinical Applications Using Highly Accurate Long-Read Sequencing Abstract #: 1873 Elizabeth Tseng 1 , Jason G. Underwood 1 , Arjun Scott Nanda 2 , Vijay Ramani 2 , Scott N. Furlan 3 1 PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025 2 UCSF, San Francisco, CA 3 Fred Hutchinson Cancer Research Center, Seattle, WA Improving scIso-Seq Throughput on PacBio Systems PacBio Sequencing & Deconcatenation Single-Cell Deconvoluation With Short or Long Reads PacBio Iso-Seq method generates full-length transcript sequences up to ~15kb with high accuracy (>99.9%) 10X single-cell systems produce ~50% TSO-TSO artifact cDNA Using TSO artifact depletion and cDNA concatenation, we achieve ~6X throughput, or 8-9 million full-length cDNA molecules per SMRT Cell 8M for the 10X single-cell platform We applied to this throughput-improvement method to 10X single-cell libraries sequenced on PacBio Sequel II systems Demonstrated cell BC concordance with matching short read libraries Full-length isoform information revealed distinct expression levels in T cells not observable through 3’ tagging methods scIso-Seq Throughput Improvement Methodology Sample A Sample B HiFi Reads 2,557,092 3,174,724 Reads with cDNA primers 2,151,948 2,726,226 Deconcatenated cDNAs 7,853,190 8,519,673 Hypothetical cDNAs w/out TSO depletion and concatenation 1,075,974 1,363,113 Effective Throughput Increase ~7.2X ~6.2X Distribution of Concatemers per Long Read Sample A Transcript Classification using SQANTI3 Cell BC concordance, PacBio vs. Illumina 10 0 10 20 10 5 0 5 10 15 lrUMAP_1 lrUMAP_2 Long Reads 10 0 10 10 0 10 20 srUMAP_1 srUMAP_2 Short Reads 15388 short reads/cell 936 cDNAs/cell Knee plot, all BC PacBio Short reads calls 8386 single cells (10X Cell Ranger) B Memory B Naive Basophils CD14 Mono CD16 Mono CD4 Memory CD4 Naive CD8 Effector CD8 Memory CD8 Naive CD8 TRBV9 cDC ISG15_High Treg MAIT Multiplets Neutrophil NK pDC Proliferating RBC Treg 10 0 10 20 10 5 0 5 10 15 lrUMAP_1 lrUMAP_2 ISG20 Isoforms Assigned to Single Cells Multiple None PB.30915.37 PB.30915.4 PB.30915.5 PB.30915.7 T cell lineages T cells express 4 common isoforms B Memory B Naive Basophils CD14 Mono CD16 Mono CD4 Memory CD4 Naive CD8 Effector CD8 Memory CD8 Naive CD8 TRBV9 cDC ISG15_High Treg MAIT Multiplets Neutrophil NK pDC Proliferating RBC Treg Sample A (5’ library) read schema Assigning Isoforms to Single Cells ISG20 : Interferon Stimulated Exonuclease Gene 20 The Complete Diversity of ISG20 Isoforms Expressed in CD4 Naïve Cells CD8 Naïve Cells Prefer from the Downstream TSS GENCODE Reference

Upload: others

Post on 20-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Isoform Resolution Single-Cell Transcriptomics for

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2021 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELFare trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks of Advanced Analytical Technologies. All other trademarks are the sole property of their respective owners.

Towards Isoform Resolution Single-Cell Transcriptomics for Clinical Applications Using Highly Accurate Long-Read SequencingAbstract #: 1873Elizabeth Tseng1, Jason G. Underwood1, Arjun Scott Nanda2, Vijay Ramani2, Scott N. Furlan3

1PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025 2UCSF, San Francisco, CA 3Fred Hutchinson Cancer Research Center, Seattle, WA

Improving scIso-Seq Throughput on PacBio Systems PacBio Sequencing & Deconcatenation Single-Cell Deconvoluation With Short or Long Reads

• PacBio Iso-Seq method generates full-length transcript sequences up to ~15kb with high accuracy (>99.9%)

• 10X single-cell systems produce ~50% TSO-TSO artifact cDNA• Using TSO artifact depletion and cDNA concatenation, we

achieve ~6X throughput, or 8-9 million full-length cDNA molecules per SMRT Cell 8M for the 10X single-cell platform

• We applied to this throughput-improvement method to 10X single-cell libraries sequenced on PacBio Sequel II systems

• Demonstrated cell BC concordance with matching short read libraries

• Full-length isoform information revealed distinct expression levels in T cells not observable through 3’ tagging methods

scIso-Seq Throughput Improvement Methodology

Sample A Sample B

HiFi Reads 2,557,092 3,174,724

Reads with cDNA primers 2,151,948 2,726,226

Deconcatenated cDNAs 7,853,190 8,519,673

Hypothetical cDNAs w/out TSO depletion and concatenation

1,075,974 1,363,113

Effective Throughput Increase ~7.2X ~6.2X

Distribution of Concatemers per Long ReadSample A

Transcript Classification using SQANTI3

Cell BC concordance, PacBio vs. Illumina

−10

0

10

20

−10 −5 0 5 10 15lrUMAP_1

lrUM

AP_2

B MemoryB NaiveBasophilsCD14 MonoCD16 MonoCD4 MemoryCD4 NaiveCD8 EffectorCD8 MemoryCD8 NaiveCD8 TRB−V9

cDCISG15_High TregMAITMultipletsNeutrophilNKpDCProliferatingRBCTreg

Long Reads

−10

0

10

−10 0 10 20srUMAP_1

srU

MAP

_2

B MemoryB NaiveBasophilsCD14 MonoCD16 MonoCD4 MemoryCD4 NaiveCD8 EffectorCD8 MemoryCD8 NaiveCD8 TRB−V9

cDCISG15_High TregMAITMultipletsNeutrophilNKpDCProliferatingRBCTreg

Short Reads

15388 short reads/cell936 cDNAs/cell

Knee plot, all BC PacBio

Short reads calls8386 single cells

(10X Cell Ranger)

−10

0

10

−10 0 10 20srUMAP_1

srU

MAP

_2

B MemoryB NaiveBasophilsCD14 MonoCD16 MonoCD4 MemoryCD4 NaiveCD8 EffectorCD8 MemoryCD8 NaiveCD8 TRB−V9

cDCISG15_High TregMAITMultipletsNeutrophilNKpDCProliferatingRBCTreg

Short Reads

−10

0

10

20

−10 −5 0 5 10 15lrUMAP_1

lrUMAP

_2

MultipleNonePB.30915.37PB.30915.4PB.30915.5PB.30915.7

iso30915ISG20 Isoforms Assigned to Single Cells

−10

0

10

20

−10 −5 0 5 10 15lrUMAP_1

lrUMAP

_2

MultipleNonePB.30915.37PB.30915.4PB.30915.5PB.30915.7

iso30915

−10

0

10

20

−10 −5 0 5 10 15lrUMAP_1

lrUMAP

_2

MultipleNonePB.30915.37PB.30915.4PB.30915.5PB.30915.7

iso30915

T celllineages

T cells express4 common isoforms

−10

0

10

−10 0 10 20srUMAP_1

srU

MAP

_2

B MemoryB NaiveBasophilsCD14 MonoCD16 MonoCD4 MemoryCD4 NaiveCD8 EffectorCD8 MemoryCD8 NaiveCD8 TRB−V9

cDCISG15_High TregMAITMultipletsNeutrophilNKpDCProliferatingRBCTreg

Short Reads

Sample A (5’ library) read schema

Assigning Isoforms to Single CellsISG20 : Interferon Stimulated Exonuclease Gene 20

The Complete Diversity of ISG20 Isoforms Expressed in CD4 Naïve Cells

CD8 Naïve Cells Prefer from the Downstream TSS

GENCODE Reference