supplementary figure 1. the smart-seq protocol...adapter ligation tagmentation (tn5) pcr...

First-strand synthesis & tailing by SMARTScribe RT

Amplify cDNA by PCR

Poly A+ RNA

polyA 3'

SMARTer II AOligonucleotide

SMART CDS primer

Double-stranded cDNA

Template switching and extension by SMARTScribe RT

polyAXXXXX

5'

XXXXX5'

XXXXX

polyA

XXXXXXXXXX5'

Singlestep

5'

5'

End repair

Shearing (Covaris)

Base addition

Adapter ligation

Tagmentation (Tn5)

PCR amplification

Sequence library

Supplementary Figure 1. The Smart-Seq protocolProtocol for cDNA amplification and library preparation from few or single cells. Cells are first lysed in reverse transcription compatible buffer and the reaction is initiated with oligo-dT containing primer. First strand synthesis is completed with the addtion of a few untem-plated C nucleotides followed by template switching and the incorporation of SMARTer IIA oligonucleotide. Full-length cDNAs are amplified using PCR to obtain a few nanograms of DNA. Fragmentation and adapter introduction was performed using either acoustic shear-ing or transposase tagmentation based procedures to generate sequencing libraries.

0%

100%80%60%40%20%

0%

100%80%60%40%20%

0%

100%80%60%40%20%

0%

100%80%60%40%20%

0%

100%80%60%40%20%

0%

100%80%60%40%20%

0%

100%80%60%40%20%

0%

100%80%60%40%20%

0-1 kb 1-2 kb 2-3 kb

3-4 kb 4-5 kb 5-7 kb

7-9 kb 9-15 kb

oocytes (Smart-Seq, n=3)oocytes, 2-cell and 4-cellblastomeres (Ref. 7, n=24)

0 1 0 21 31 2from 3’ end (kb)

0 4from 3’ end (kb)





1 2 3 1 2 3 4 2 4 61 3 5

3 6 3 6 9 12

from 3’ end (kb)0

from 3’ end (kb)

read

cov

erag

ere

ad c

over

age

read

cov

erag

e

ES cells, ICM, epiblast(Ref. 8, n=34)

d e f

g h

i j k

Supplementary Figure 2. Comparison of Smart-Seq and Tang et al. data (2009, 2010)(a-h) Read coverage across transcripts for Smart-Seq oocytes, oocytes and blasto-meres from Ref. 7 and cells from Ref. 8. Transcripts has been separated according to their lengths as indicated over the plots. The gray vertical line indicate the point from where the coverage is expected to drop based on the span of transcript lengths included in each figure. (i-k) Comparisons of sensitivity between mouse oocytes generated with Smart-Seq (green) and from Ref. 7 (red), using 5 million reads per cell. (i) The number of genes detected in each oocyte (result not sensi-tive to read cutoff for detection). (j) The percentage of genes detected in pairs of oocytes, as a function of the gene expression level (max over biologial replicates). Error bars, s.e.m. (n�7). (k) Variation in expression level estimation between pairs of oocytes as a function of the mean expression. Error bars, s.e.m (n�7).

0 1 2 30.0

0.1

0.2

0.3

std

dev,

log1

0 sc

ale

RPKM, log10

5,000 10,000*HQHV�GHWHFWHG��UHDG�

0

oocyte, Smart-Seqoocyte, Smart-Seqoocyte, Smart-Seq

oocyte, Tang et al. 2009oocyte, Tang et al. 2009

0%

100%

0 1 2 3RPKM, log10

dete

ctio

n

a b c

oocytes (Smart-Seq)oocytes (Ref. 7)

a0

33

019

027

033

040

077

0263

0131

0100

035

mBrainSmart-Seq

ES cells(Tang2010)

POLR2B

[0 - 77]

57,850 kb 57,860 kb 57,870 kb 57,880 kb 57,890 kb 57,900 kb

56 kb

b

3' end

Supplementary Figure 3. Full-transcript coverage and reconstruction(a) Smart-Seq reads mapping to a 14-kb region of the RNA polymerase II alpha gene locus containing 27 of the 28 exons. The read coverage is displayed for Smart-Seq data from diluted amounts of mouse brain RNA (green) and for previ-ously published single-cell data from mouse embryonic stem (ES) cells (red) from Tang et al. 2010. (b) Example of a fully reconstructed transcript (human POLR2B) from Smart-Seq reads from an individual T24 cancer cell line cell. Reads were visualized using the Integrated Genome Viewer (IGV, Broad Insti-tute).

ba

c

Gene expression (RPKM), log10

0.0 0.5 1 21.5 2.5 3 3.5

100%

80%

60%

40%

20%

0%

Gen

e de

tect

ion

(%)

1M reads5M reads10M reads20M readsfull depth

within cell lines

non-ampl.

1ng100pg10pg

Gene expression (RPKM), log10

0.0 0.5 1 21.5 2.5 3 3.5

100%

80%

60%

40%

20%

0%

Gen

e de

tect

ion

(%)

10ng1ng100pg10pg

mRNA-Seq

Smart-Seq (PE)

Smart-Seq (Tn5)

1 10 1000.1

Read depth (million uniquely mapped reads)

Spe

arm

an c

orre

latio

n0.71

0.70

0.69

0.68

0.67

0.66

0.65

millionsmRNA-Seq

1 40 80 120 160

Number of LNCaP cells (Smart-Seq)

0

3,000

6,000

9,000

12,000

15,000

Num

ber o

f gen

es d

etec

ted

10 cells (n~11,900 genes)5 cells (n~11,200 genes)

d

80 cells (n~13,400 genes)

Supplementary Figure 4. Sensitivity and variation in Smart-Seq data(a) The number of detected genes in individual cells or when combining data from two or more cells as indicated on the x-axis. For comparison, the last data point show the number of genes detected in standard mRNA-Seq from millions of cells. (b) Gene sensitivity presented as the mean percentage of genes detected in replicates, binned according to expression levels. We performed all pair-wise comparisons within groups of replicates anre report the mean and 95% confidence interval. Comparing sequence libraries constructed using either transposase “tagmentation” (Tn5) or shearing followed by ligation (PE) from indicated low amounts of starting materials. (c) Gene detection sensitivity for cancer cells as in (Fig. 2b) but using different sequence depths. (d) Spearman correlation (rho) in gene expression levels between different cells of the same origin, as a function of the sequence depth.

20,000

10,000

14,000

12,000

16,000

18,000

8,000

Ref

seq

trans

crip

ts

Number of PCR cycles

151050 18

Total RNA starting amount (ng)

0.01 0.1 1 10 NA

a

Supplementary Figure 5. Transcript detection and coverage with varying PCR cycles(a) The number of Refseq transcripts detected as a function of PCR cycles (left) and starting amount of total RNA (right). A RPKM threshold of 0.1 was used for detection. Colors separate independent dilution series. The error bars show standard error. (b) Mean read coverage over transcripts. We compared read coverage for Smart-Seq data generated from 1ng or 10pg of total RNA using either 12 or 18 cycles of PCR. No reproducible trends of PCR cycles on read coverage were observed. Errors bars represent standard deviation.

b

Transcript (%)5' end

Rea

d co

vera

ge (%

)

100

80

60

40

20

0

Smart-Seq1ng, 18 cycles10pg, 18 cycles1ng, 12 cycles

3' end

0 2 4 6 8 10 12

02

46

810

12

mBrain (1ng, rep 1) RPKM, log2

spearman r=0.97pearson r=0.95

mB

rain

(1ng

, rep

2) R

PK

M, l

og2



mBrain (50pg, rep 1) RPKM, log2

mB

rain

(50p

g, re

p 2)

RP

KM

,log 2

T24 cell (cell 1) RPKM, log2

a

b c

T24

cell

(cel

l 2) R

PK

M, l

og2

0 2 4 6 8 10 12

02

46

810

12

0 2 4 6 8 10 12

02

46

810

12

1ng B

100pg B

10pg U

dilution (Br UHRR PE)

10ng100pg

10pg

dilution (Brain, UHRR Tn5)

10ng100pg

10pgskmel

uacc PMCTChESPC3

LNCaPT24

10ng100pg

10pg

human cancer cell linecells

T24 vs. l

ncap

T24 vs. P

C3

lncap vs. P

C3

human cells and caner cell line cells

PM vs:

skmeluaccCTC

hESCTC vs:

skmeluaccPM hES

1.0

0.9

0.8

0.7

0.6

0.5

0.4

Pear

son

corr

elat

ion

Brain vs. UHRR:Brain UHRR

d

Supplementary Figure 6. Correlations in gene expression levels between technical and biological replicates. (a) Correlations over gene expression levels in dilution- or single cell replicates were shown as boxplots (black). Analyses of different types of cells or diluted RNA of different origins were shown in red. Correlations were computed from log2 transformed RPKM gene expression levels. (b-d) Representative scatter plots of gene expression levels from libraries generated from independent technical replicates of 1ng total RNA (b), or 50pg total RNA (c), or libraries generated from independent T24 cancer cell line cells (d). These scatter plots are representative of gene expression levels variations found and the results from all pairwise comparisons are summarized in Fig. 2c,d.

a b

LNCaP UHR hBrain mBrain LNCaP UHR hBrain mBrain

0.4

0.2

0.0

-0.2

-0.2

-0.4

Cor

rela

tion

with

tran

scrip

t G+C

frac

tion

Cor

rela

tion

with

tran

scrip

t len

gth

mRNA-SeqSmart-Seq (PE)Smart-Seq (Tn5)

mRNA-SeqSmart-Seq (PE)Smart-Seq (Tn5)

Supplementary Figure 7. Analyses of GC and length biases in Smart-Seq and mRNA-Seq data(a) Pearson correlation coe!cients between log transformed expression values and the tran-script GC content for pre-ampli"ed (Smart-Seq) and standard mRNA-Seq data. (b) Pearson correlation coe!cients between log transformed expression values and log transcript length for pre-ampli"ed (Smart-Seq) and standard mRNA-Seq. As the true biological GC and length biases are not known, there is no obvious best correlation. We found no systematic di#erence between Smart-Seq data generated with shearing and ligation (PE) or Tn5-mediated “tagmentation” (Tn5) or standard mRNA-Seq samples, but a tendency towards lower GC content correlations for Smart-Seq data.

0.72±0.005

0.58±0.003

0.74±0.008

0.56±0.016

0.60±0.004

0.73±0.003

0.92±0.009

0.54±0.013

0.65±0.002

0.57±0.002

0.71±0.002

0.63±0.006

0.61±0.004

0.60±0.002

0.76±0.007

0.56±0.014

0.59±0.004

0.55±0.003

0.70±0.006

0.56±0.015

brainESC CTC10 pg 100+ pg

ESC

brain

melanoma

lymph node

leukocytes

Smart-Seq

standardmRNA-Seq

Supplementary Figure 8. Correlation analyses between Smart-Seq and mRNA-Seq gene expression levels.Individual human ES cells (ESC, n=8), diluted brain RNA at 10pg and 100pg or more, and individual CTCs (n=6), compared with mRNA-Seq data from

error of the mean is shown for each comparison. Within each Smart-Seq sample (i.e. columns) the matrix cells are colored according to a linear white to red scale, with white at lowest mean correlation red at maximum mean correlation. Note

mRNA-Seq data from ES cells and melanoma respectively. Identical Brain RNA prepared with Smart-Seq and mRNA-Seq correlate higher with increasing start-ing amounts in Smart-Seq. ESC RNA-Seq data was downloaded from GEO (GSM672836).

PMELMITFTYRMLANA

Expression (RPKM, log2)

SKMEL UACC PM CTC ESC

0 105

ImmunemRNA-Seq

BL BJ W LPBL

Supplementary Figure 9. Single-cell cDNAs from peripheral blood lym-phocytesTwo individual peripheral blood lymphocytes (PBLs) were prepared using Smart-Seq and shallow sequenced on a MiSeq. These two cells (PBL) were compared with Smart-Seq transcriptomes from melanoma cancer cell lines (SKMEL5, UACC257), primary melanocytes (PM), circulating tumor cells (CTC), human embryonic stem cells (ESC) and mRNA-Seq transcriptome data from burkitts lymphoma cell lines (BL41, BJAB), white blood cells (W) and lymphnodes (L). Gene expression levels (RPKM, log2) for marker genes of melanocyte and hematopoietic lineages are shown in a heatmap. Despite a shallow sequence depth, we detected markers for hematopoietic lineage in both PBLs, in contrast to melanocyte lineage markers.

melanocytes

lymphocytes,monocytes

Lineage Markers:

CD53PTPRCCCL5CD4

Smart-Seq on individual cells

0 31 2Expression (RPKM, log10)

MAGEH1

MAGEA1

MAGEC1

MAGEB1+MA GEB4

MAGEA11

MAGEA8

MAGEF1

MAGED4+MA GED4B

MAGED4+MA GED4B

MAGEE1

MAGED2

MAGED1

MAGEB6

MAGEL2

MAGEA4

MAGEB10

MAGEB18

MAGEC3

MAGEB16

MAGEB3

MAGEE2

MAGEA9+MA GEA9B

MAGEA9+MA GEA9B

MAGEB2

MAGEC2

MAGEA10+MA GEA10-MAGEA5+MA GEA5

MAGEA2+MA GEA2B

MAGEA2+MA GEA2B

MAGEA12

MAGEA6

MAGEA3

SKMEL5

UACC257

PM CTChESC

BL

Supplementary Figure 10. Expression of all melanoma-associated antigens (MAGEs)Heatmap showing the expression levels of all MAGE genes in CTC, PM, melanoma cell lines (SKMEL5 and UACC257), and in human ESC and Burkitts lymphoma cell lines BL41 and BJAB (BL).

Relative Expression ( log2)

-5 50

Supplementary Figure 11. Detection of SNPs and mutations in CTCs(a) Genomic sites with support for allelic difference between single-cell CTCs and reference sequence were sorted according to the QUAL values (Phred scaled probability of polymor-phism). For bins of genomic sites, we analyzed the fraction of sites that were already present in dbSNP (build 132) and the ratio of transitions to transversions (Ti/Tv). In genomic DNA re-sequencing studies, one expects to find ~90% of individual differences to be present in dbSNP and a Ti/Tv ratio of ~2.1 (genome-wide) or ~2.8 (exons). We found that sites with QUAL values of 500 or higher (dashed line) had 92% overlap with SNPdb and a Ti/Tv ratio of 2.3, suggesting that genotype calls above that QUAL threshold are reliable. At this threshold we found 4,312 genomic sites using transcriptome data from the six CTCs and requiring that the allele be supported in two or more CTCs. (b) The number of sites indentified requiring support in an increasing number of CTCs. The barplot shows the number of genomic sites with known or novel transitions and transversions. We found a large difference between sites supported in one or two cells, likely reflecting a large amount of false positives in calls from a single cell. Requiring support in two or more cells led to more consistent and slowly declining number of SNP or mutation calls that probably reflect power. The QUAL threshold used is 500 as indicated with the dashed line in (a).

Ti/Tv1 1.5 2.52

0.5 0.6 0.80.7 0.9 1.0

Fraction of sites overlapping dbSNP (132)

Gen

omic

site

s

1600

2500

4000

6300

10000

a

Transitions, known SNPsTransversions, known SNPsTransitions, novel sitesTransversions, novel sites

Num

ber o

f site

s (c

umul

ativ

e)Minimum number of samples

b

1 3 4 5 62

8,000

6,000

4,000

2,000

0

supplementary figure 1. the smart-seq protocol...adapter ligation tagmentation (tn5) pcr...

Documents