20140711 4 e_tseng_ercc2.0_workshop

FIND MEANING IN COMPLEXITY For Research Use Only. Not for use in diagnostic procedures.

Elizabeth Tseng / 2014.07.11 Staff Scientist

Technical Variability in PacBio® Full-length cDNA (Iso-SeqTM) Sequencing

SampleNet: Iso-Seq Method with Clonetech® cDNA Synthesis Kit

PacBio’s Iso-Seq™ Method for High-quality, Full-length Transcripts

PolyA mRNA AAAAA

cDNA synthesis with adapters

AAAAA TTTTT

Size partitioning & PCR amplification

SMRTbell™ ligation

PacBio® RS II Sequencing

Experimental Pipeline

Informatics Pipeline

Remove adapters Remove artifacts

Clean sequence

Reads clustering

Isoform clusters

Consensus calling

Nonredundant transcript isoforms

Quality filtering

Final isoforms PacBio raw sequence

5’ primer 3’ primer

Map to reference genome

Experimental pipeline Informatics pipeline

PacBio raw sequence reads

Figure 1

AAAAAAAAAA

AAAAAAAAAAAAAAA

Size partitioning &PCR amplification

cDNA synthesiswith adapters

SMRTbell ligation

RS sequencing

Remove adaptersRemove artifacts

Reads clustering

Quality filtering

Cleansequence reads

Nonredundant transcript isoforms

Final isoforms

Consensus calling

Isoform clusters

Map to reference genome

Evidence-based gene models

polyA mRNA

AAAATTTT

Evidenced-based gene models

(AAA)n

(TTT)n

1 2 3 4 5

6 7 8 9 10

(TTT)n(AAA)n

Coding sequence polyA tail

SMRT® adapter

DevNet: Iso-Seq wiki page

(AAA)n Reads of Insert (AAA)n

Iso-Seq Full-length cDNA Library Protocol

polyA+ RNA

Total RNA

Optional Poly-A Selection

Reverse Transcription (SMARTScribe RT)

Full-‐length 1st Strand cDNA

PCR Optimization

Large-scale Amplification

Amplified cDNA

1-‐2 kb

2-‐3 kb

3-‐6 kb

Size Selection

1-‐2 kb

2-‐3 kb

3-‐6 kb

Re-Amplification

1-‐2 kb

2-‐3 kb

3-‐6 kb

SMRTbell™ Template Preparation

1-‐2 kb

2-‐3 kb

3-‐6 kb

SMRT® Sequencing

3-‐6 kb

Optional Size Selection

Iso-Seq Informatics Pipeline Per-molecule reads

Clusters of transcript alignments using FL + nFL reads

Transcript 1 Transcript 2 Transcript 3

Final transcript consensus

Full-length (FL) reads

Non-FL reads

Isoform-level clusters

Key Features of Current Iso-Seq Bioinformatics

•  Non-redundant, full-length, transcript consensus sequences –  No assembly –  De novo

–  Achieves high-quality consensus (≥ 99%) –  Universal PacBio features: robust to GC%, repeat structure, etc

•  Applications

–  Alternative splicing

–  Fusion transcripts

–  Alternative polyadenlyation –  (possible w/ proper protocol) Alternative start sites

Disclaimer

•  Everything shown from now on are transcripts/isoforms, not genes

•  Data shown is preliminary, very unbaked

•  Concept Analysis

Count Information Associated with Each Unique Transcript

Clusters of transcript alignments using FL + nFL reads

Final transcript consensus

Count matrix

Transcript Count Norm_Count

1 2 3 …

8 5 7 …

0.08 0.05 0.07 …

Count Information from non-FL reads

For non-FL reads: •  If uniquely associated with a transcript, assume it is the transcript •  If ambiguously associated, most likely because it’s a partial match

•  For now, weight of ambiguous nFL is just

read _ count = # of FL + # of unique nFL + weighted # of ambiguous nFL

1Number of associated transcripts

In current dataset, about 40-60% nFL reads partially match multiple isoforms (FL reads are always fully and uniquely associated)

Read Count Variation in Technical Replicates

Rat Heart •  Technical replicates (same starting RNA & protocol) •  3 size libraries (1 – 2 kb, 2 – 3 kb, 3 – 6 kb) •  Runs from diff sizes pooled for

bioinformatics pipeline

Boxplot of log2 read counts

Scatterplot of log2 read count for each transcript

Rat Heart, technical replicates

Read Count Variation in Technical Replicates

Rat Lung, technical replicates

All technical replicates were seq with total ~8 SMRT® Cells (low depth) Most NA transcripts are low counts

Choice of Chemistry Does Not Bias Sequencing

Rat Brain Same 3-size library (not technical replicate) •  Sequenced with P4-C2 chemistry •  Sequenced with P5-C3 chemistry

However for longer (> 3 kb) transcripts, P5-C3 chemistry will increase chance of seeing FL reads

Choice of PCR Enzyme May Bias Amplification

Human Brain, 2 – 3 kb library

Human Brain, 3 – 6 kb library

Current Iso-Seq Protocol Amplifies Sample Twice

polyA+ RNA

Total RNA

Optional Poly-A Selection

Reverse Transcription (SMARTScribe RT)

Full-‐length 1st Strand cDNA

PCR Optimization

Large-scale Amplification

Amplified cDNA

1-‐2 kb

2-‐3 kb

3-‐6 kb

Size Selection

1-‐2 kb

2-‐3 kb

3-‐6 kb

Re-Amplification

1-‐2 kb

2-‐3 kb

3-‐6 kb

SMRTbell™ Template Preparation

1-‐2 kb

2-‐3 kb

3-‐6 kb

SMRT® Sequencing

3-‐6 kb

Optional Size Selection

2nd Amplification Does Not Introduce Strong Bias

FL Read Length Distribution

Std. vs. skipping 2nd amp

Std. vs. skipping 1st amp Skipping 1st amplification results in size selection of first-strand cDNA that may be hard to optimize

Expected Transcript Variability in Different Rat Tissues

Rat Heart vs Rat Lung

Rat Heart vs Rat Brain

Heart Lung

Heart Brain

Conclusion

•  Technical variation not a big issue –  If done with same library protocol –  Different (PCR) enzymes bias amplification

–  Amplification can be tolerated if kept at reasonable # of cycles

•  Potential for DE –  Still many unknown factors –  Everything shown in previous slides merely “proof of concept”

–  With control comes better modeling

Looking Ahead

•  Detection limit •  Amplification bias

–  Adding control at known %

–  Factors: GC? Length? Enzyme?

•  Account for library pooling •  Ambiguous mapping •  Modeling bias •  DE isoform detection •  Combining short-read data

Wet Lab Bioinformatics

For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.

20140711 4 e_tseng_ercc2.0_workshop

Science

new deal à la...

manual de operaciÓn del sistema código...

20140711-nueva ley de contrataciones del estado - ley 302254

imagens.seplag.ce.gov.brimagens.seplag.ce.gov.br/pdf/20140711/do20140711p01.pdf ·...

il tirreno livorno 005 20140711

kw28 20140711 clipselection

20140711 2 j_willey_ercc2.0_workshop

resultados finales negociación de oferta económica -...

robert joseph beall, ph.d. › meetings › if › if14 ›...

gaceta informativa aaag › uploads › files ›...

ld b56 20140711-12_colombo 2

20140711 sb r4 rfp en final - grand challenges canada ·...

gouverneurbriers.begouverneurbriers.be/wp-content/uploads/2014/07/20140711... ·...

bloemnuusbl 20140711

20140711-comunicazioni_varie___5102_4363775 (1)

smtp.unifran.com.brsmtp.unifran.com.br/site/canais/processoseletivo/2014/20140711... ·...

diario resumen 20140711

oan 20140711 classifieds d 002 v00~cmyk - oanow.com€¦ ·...

20140711 evernotedays

bradesco bbi fn 20140711 sp master plan