supplementary material to: deep transcriptome profiling of ... · ! 1! supplementary material to:...

20
1 Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance Supplementary Data Sets Supplementary Data Set 1: Human stem cell transcriptome-profiling datasets. Supplementary Data Set 2: Mouse stem cell transcriptome-profiling datasets. Supplementary Figures Supplementary Fig. 1: Dedifferentiation of human and mouse iPSCs. Supplementary Fig. 2: CAGE tag clusters expression across stem cell samples and sample clustering. Supplementary Fig. 3: Comparison of nuclear and cytoplasmic assembled transcripts. Supplementary Fig. 4: Differential expression analyses. Supplementary Fig. 5: NAST expression features. Supplementary Fig. 6: Histone marks at NAST genomic loci. Supplementary Fig. 7: Expression levels and putative processing of NASTs. Supplementary Fig. 8: Histone marks and transcription factors binding at repeat-associated NAST loci. Supplementary Fig. 9: Human LTR-derived transcripts. Supplementary Fig. 10: Stem cell-specific enhancers associated with LTRs. Supplementary Fig. 11: Multiple negative controls for knockdown experiments in Nanog-GFP iPS_MEF-Ng-20D17 cells. Supplementary Tables Supplementary Table 1: Fisher exact test p values, Bonferroni corrected for LTR element enrichments. Supplementary Table 2: GO term enrichment for ChIA-PET detected interacting genes. Supplementary Table 3: siRNA sequences and target coordinates. Supplementary Table 4: Control siRNA sequences. Supplementary Table 5: Primers used in qRT-PCR assays. Nature Genetics: doi:10.1038/ng.2965

Upload: doannhi

Post on 27-Jul-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  1  

Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance

Supplementary Data Sets

Supplementary Data Set 1: Human stem cell transcriptome-profiling datasets.

Supplementary Data Set 2: Mouse stem cell transcriptome-profiling datasets.

Supplementary Figures

Supplementary Fig. 1: Dedifferentiation of human and mouse iPSCs.

Supplementary Fig. 2: CAGE tag clusters expression across stem cell samples and sample clustering.

Supplementary Fig. 3: Comparison of nuclear and cytoplasmic assembled transcripts.

Supplementary Fig. 4: Differential expression analyses.

Supplementary Fig. 5: NAST expression features.

Supplementary Fig. 6: Histone marks at NAST genomic loci.

Supplementary Fig. 7: Expression levels and putative processing of NASTs.

Supplementary Fig. 8: Histone marks and transcription factors binding at repeat-associated NAST loci.

Supplementary Fig. 9: Human LTR-derived transcripts.

Supplementary Fig. 10: Stem cell-specific enhancers associated with LTRs.

Supplementary Fig. 11: Multiple negative controls for knockdown experiments in Nanog-GFP iPS_MEF-Ng-20D17 cells.

Supplementary Tables

Supplementary Table 1: Fisher exact test p values, Bonferroni corrected for LTR element enrichments.

Supplementary Table 2: GO term enrichment for ChIA-PET detected interacting genes.

Supplementary Table 3: siRNA sequences and target coordinates.

Supplementary Table 4: Control siRNA sequences.

Supplementary Table 5: Primers used in qRT-PCR assays.

Nature Genetics: doi:10.1038/ng.2965

Page 2: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  2  

Supplementary Data Sets

All supplementary files are in a BED like format with headers.

All sequencing data including fastq and bam files are freely available from DDBJ repository (DRA000914).

Supplementary Data Set 1: Human stem cell transcriptome-profiling datasets.

Carninci_Supplementary_DataSets1.tar: all supplementary files for the human dataset listed bellow.

Carninci_Hs1_CAGE_nuc.txt.gz: CAGE clusters for nuclear fractions with normalized expression (tags per million) for each sample and statistical significance for the differential expression analyses between stem cells and differentiated cells.

Carninci_Hs2_CAGE_cyto.txt.gz: CAGE clusters for cytoplasmic fractions with normalized expression (tags per million) for each sample and statistical significance for the differential expression analyses between stem cells and differentiated cells.

Carninci_Hs3_CAGE_NAST.txt.gz: CAGE clusters for NAST with annotations and mean expressions (tpm) of 5 stem samples with standard deviations for nuclear and cytoplasmic expression values.

Carninci_Hs4_CAGE_LTReRNA.txt.gz: CAGE paired-clusters (eRNAs) overlapping LTRs.

Carninci_Hs5_CAGEscanAssembled.txt.gz: Assembled transcripts based on CAGEscan.

Carninci_Hs6_RNAseqAssembled_nuc.txt.gz: Assembled transcripts based on RNA-Seq for nuclear fractions.

Carninci_Hs7_RNAseqAssembled_cyto.txt.gz: Assembled transcripts based on RNA-Seq for cytoplasmic fractions.

Carninci_Hs8_shRNAseqLower.txt.gz: Short RNA-Seq clusters with minimum tag count 30 for lower fractions.

Carninci_Hs9_shRNAseqUpper.txt.gz: Short RNA-Seq clusters with minimum tag count 30 for upper fractions.

Nature Genetics: doi:10.1038/ng.2965

Page 3: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  3  

Supplementary Data Set 2: Mouse stem cell transcriptome-profiling datasets.

Carninci_Supplementary_DataSets2.tar: contain all supplementary files for the mouse dataset listed bellow.

Carninci_Mm1_CAGE_nuc.txt.gz: CAGE clusters for nuclear fractions with normalized expression (tags per million) for each sample and statistical significance for the differential expression analyses between stem cells and differentiated cells.

Carninci_Mm2_CAGE_cyto.txt.gz: CAGE clusters for cytoplasmic fractions with normalized expression (tags per million) for each sample and statistical significance for the differential expression analyses between stem cells and differentiated cells.

Carninci_Mm3_CAGE_NAST.txt.gz: CAGE clusters for NAST with annotations and mean expressions (tpm) of 6 stem samples with standard deviations for nuclear and cytoplasmic expression values.

Carninci_Mm4_CAGE_LTReRNA.txt.gz: CAGE paired-clusters (eRNAs) overlapping LTRs.

Carninci_Mm5_CAGEscanAssembled.txt.gz: Assembled transcripts based on CAGEscan.

Carninci_Mm6_RNAseqAssembled_nuc.txt.gz: Assembled transcripts based on RNA-Seq for nuclear fractions.

Carninci_Mm7_RNAseqAssembled_cyto.txt.gz: Assembled transcripts based on RNA-Seq for cytoplasmic fractions.

Carninci_Mm8_shRNAseqLower.txt.gz: Short RNA-seq clusters with minimum tag count 30 for lower fractions.

Carninci_Mm9_shRNAseqUpper.txt.gz: Short RNA-seq clusters with minimum tag count 30 for upper fractions.

Nature Genetics: doi:10.1038/ng.2965

Page 4: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  4  

Supplementary Figure 1 | Dedifferentiation of human and mouse iPSCs.

(a,b) Normalized CAGE expression levels (tpm, tags per million) for stem cell marker genes in hiPS.F (a) and miPS.T (b). Expression levels for ESCs (blue, n = 3) and somatic cell types (purple) used for iPSC derivation are shown. (c,d) Immunofluorescence analyses for the expression of Ssea1, Oct4 and Nanog in miPS.T (c) and TRA-1-60, TRA-1-81 and SSEA4 in hiPS.F (d). ESCs are used as controls. (e) Histological sections of teratomas, formed 4 weeks after subcutaneous injection of hiPS.F cells into nude mice, hematoxylin and eosin staining. Three representative germ layers (mesoderm, ectoderm and endoderm) developed from hiPS.F cells. (f) Chimeric mouse derived from miPS.T cells and germline transmission.

Nature Genetics: doi:10.1038/ng.2965

Page 5: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  5  

Supplementary Figure 2 | CAGE tag clusters expression across stem cell samples and sample clustering.

Number of stem cell samples in which CAGE tag clusters are found expressed in human (a) and mouse (b). A value of 0 corresponds to differentiated cell type samples. (c,d) Hierarchical clustering based on Spearman coefficients calculated from CAGE tag cluster expression values for human (c) and mouse (d).

Nature Genetics: doi:10.1038/ng.2965

Page 6: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  6  

Supplementary Figure 3 | Comparison of nuclear and cytoplasmic assembled transcripts.

Numbers and cellular distribution of transcripts identified from RNA-seq assemblies for human (a) and mouse (b) data sets.

Nature Genetics: doi:10.1038/ng.2965

Page 7: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  7  

Supplementary Figure 4 | Differential expression analyses.

(a–d) M-Aplots of differentially expressed CAGE clusters (edgeR27, FDR < 0.01 indicated in red) for the mouse (a,b) and human (c,d) nuclear and cytoplasmic data sets. (e) Proportion of CAGE clusters significantly upregulated in stem cells (Up-Stem) at FDR < 0.01.

Nature Genetics: doi:10.1038/ng.2965

Page 8: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  8  

Supplementary Figure 5 | NAST expression features.

(a–d) Number of stem cell samples that expressed NASTs for human (a,b) and mouse (c,d) data sets. Panels show NASTs identified in the nuclear (Nu), cytoplasmic (Cy) compartments or both (Nu/Cy). (e) Percentage of CAGE clusters overlapped by CAGE-scan 5′ tags. (f) Number of tissues and differentiated cell type samples from the FANTOM5 expression atlas29 in which annotated CAGE clusters overexpressed in stem cells are expressed. Bin width = 1. (g) Nuclear (x axis) and cytoplasmic (y axis) normalized expression (tpm, tags per million) for the human CAGE clusters overexpressed in stem cells. Similar plots are shown in h,i for a set of mouse (h) and human (i) nuclear (red) and cytoplasmic (blue) transcripts. (j,k) GRO-Seq30,31 signal enrichment at human (j) and mouse (k) NAST positions.

Nature Genetics: doi:10.1038/ng.2965

Page 9: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  9  

Supplementary Figure 6 | Histone marks at NAST genomic loci.

NASTs were classified as enhancers, promoters or others based on specific combinations of histone marks (Online Methods), using ChIP-seq signal from the ENCODE Project5 for the mouse ES-Bruce4 and ES-E14 cell lines as well as for the human H1-ES cell line. Normalized tag frequencies for all histone mars are plotted for each category and cell line.

Nature Genetics: doi:10.1038/ng.2965

Page 10: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  10  

Supplementary Figure 7 | Expression levels and putative processing of NASTs.

(a,b) CAGE-based normalized expression (tpm, tags per million) for human (a) and mouse (b) NASTs (red) and annotated CAGE tag clusters (blue), identified in the nucleus (Nu) or cytoplasm (Cy) or in both cellular compartments (Nu/Cy). (c) Ct values for five NASTs and Gapdh are shown together with spiked firefly RNAs, used as a reference to evaluate copy number per cell. n = 3; error bars, s.d. (d) Transcript length as defined by RNA-seq assembly for NASTs and annotated genes compared for three expression groups (≥5, 1–5 and 0.1–1 tpm). n, number of clusters or transcripts per group. P values for two-sided Wilcoxon and Mann-Whitney tests are shown. (e) Fraction of NASTs and annotated CAGE clusters, as defined by CAGE-scan, overlapping short RNA-seq (15- to 40-bp fraction) clusters, grouped by expression levels.

Nature Genetics: doi:10.1038/ng.2965

Page 11: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  11  

Supplementary Figure 8 | Histone marks and transcription factor bindings at repeat-associated NAST loci.

(a,b) Normalized expression (tpm, tags per million) is plotted for mouse (a) and human (b) annotated genes and NASTs carrying promoter-associated histone marks. P values for Wilcoxon and Mann-Whitney tests are shown. n, number of CAGE clusters per group. (c) Frequency plots of normalized ChIP-seq tag counts (ENCODE data5) for H3K4me3 (promoter) and H3K9me3 (repressive) marks at NAST-associated and non-expressed (N.Exp.) MaLR elements. (d–f) ChIP-seq normalized tag counts for stem cell–specific transcription factors at NAST-associated and non-expressed (N.Exp.) mouse ERVK (d), mouse MaLR (e) and human ERV1 (f) elements. Values for non-expressed elements are shown in gray (dotted lines).

Nature Genetics: doi:10.1038/ng.2965

Page 12: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  12  

Supplementary Figure 9 | Human LTR-derived transcripts.

(a) Repeat family normalized expression values (tpm, tags per million) are plotted for human ESCs, iPSCs and differentiated cells (Dif.). Error bars, s.d. (b–d) Mouse ERVK (b) and MaLR (c) as well as human ERV1 (d) normalized nuclear expression is plotted for ESCs, iPSCs and differentiated cells (Dif). N, number of CAGE tag clusters carrying promoter-associated histone marks. (e) Normalized expression for selected human subfamily repeats are plotted against associated FDR (calculated with edgeR27). (f) The number of repeat elements with at least five CAGE tags is plotted against copy number found in the genome for human LTRs.

Nature Genetics: doi:10.1038/ng.2965

Page 13: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  13  

Supplementary Figure 10 | Stem cell-specific enhancers associated with LTRs.

(a,b) Relative CAGE tag distributions along mouse ERVK-RLTR9E (a) and human ERV1-LTR7 and HERVH-int (b) elements. Gray bars mark the 5′ and 3′ extremities of each repeat element. Green and purple bars indicate CAGE tags mapping to the plus and minus strands, respectively. (c) Density plot for directionality scores at loci showing divergent transcription overlapping intergenic LTRs (red) and from annotated TSSs (blue). (d) Density plots of normalized tag counts for human DNase I footprints40 (d) and ChIP-seq5,41 (e,f) at loci presenting divergent transcription patterns and overlapping LTRs. (f) Promoters, NASTs associated with LTRs and classified as promoters in Figure 2b; enhancers, loci presenting divergent transcription patterns and overlapping LTRs. (g) Number of tissue and differentiated cell type samples from the FANTOM5 expression atlas29 in which LTR enhancer-associated CAGE tag clusters are expressed. Enlarged plots are shown for the first five bins. Bin width = 1 sample. (h) Frequency distribution of the distances between interacting loci identified by ChIA-PET.

Nature Genetics: doi:10.1038/ng.2965

Page 14: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  14  

Supplementary Figure 11 | Multiple negative controls for knockdown experiments in Nanog-GFP iPS_MEF-Ng-20D17 cells.

The normalized Nanog-GFP–positive population, adjusted to the mock control (black bar), quantified by flow cytometry analysis 48 h after siRNA transfections is shown for 12 negative control siRNAs: 2 scrambled sequences, 1 siRNA targeting the luciferase transcript and 7 siRNAs targeting LTR, LINE and SINE elements not expressed in our data set, as well as 2 siRNAs targeting mRNAs originating from genes (Sdr16c6, Wfdc6a) with promoters overlapping LTR elements. Positive control siRNAs (green bars) targeting Nanog and Sox2 are shown for comparison. n = 3; error bars, s.d.

Nature Genetics: doi:10.1038/ng.2965

Page 15: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  15  

Supplementary Table 1 | Fishers exact test p-values, Bonferroni corrected for LTR element enrichments

LTR families CAGE-clusters

Enhancer (Nu) Promoter (Nu) Enhancer (NuCy)

Promoter (NuCy)

Stem specific 5.79e&11 8.80e&16 8.80e&16 8.80e&16Dif. specific 1 1 1 1

Stem specific 0.00205 8.80e&16 1 8.80e&16Dif. specific 1 1 1 1

Stem specific 0.082 4.86e&12 4.16e&7 8.80e&16Dif. specific 1 1 1 1

Stem specific 0.011 0.24 0.029 2.62e&8Dif. specific 1 1 1 1

Category of CAGE-clusters

ERVK (mouse)

MaLR (mouse)

ERV1 (human)

MaLR (human)

Nature Genetics: doi:10.1038/ng.2965

Page 16: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  16  

Sub-root category name GO ID Number of genes

Adjusted p-value

biological process cellular nitrogen compound metabolic process GO:0034641 104 9.30E-13biological process heterocycle metabolic process GO:0046483 102 9.87E-13biological process nucleobase-containing compound metabolic process GO:0006139 100 1.27E-12biological process cellular aromatic compound metabolic process GO:0006725 102 1.36E-12biological process organic cyclic compound metabolic process GO:1901360 104 1.51E-12biological process cellular metabolic process GO:0044237 140 1.97E-12biological process single-organism metabolic process GO:0044710 148 4.79E-12biological process organic substance metabolic process GO:0071704 144 5.45E-12biological process primary metabolic process GO:0044238 139 1.58E-11biological process nitrogen compound metabolic process GO:0006807 105 1.61E-11biological process metabolic process GO:0008152 152 1.03E-10biological process cellular macromolecule metabolic process GO:0044260 112 2.19E-10biological process nucleic acid metabolic process GO:0090304 84 3.66E-10biological process chromatin assembly GO:0031497 12 3.97E-09biological process macromolecule metabolic process GO:0043170 117 7.40E-09biological process chromatin assembly or disassembly GO:0006333 12 8.44E-08biological process nucleosome assembly GO:0006334 10 1.65E-07biological process regulation of nucleobase-containing compound metabolic process GO:0019219 68 3.42E-07biological process protein-DNA complex assembly GO:0065004 11 4.45E-07biological process DNA packaging GO:0006323 12 5.70E-07biological process regulation of nitrogen compound metabolic process GO:0051171 68 5.97E-07biological process DNA metabolic process GO:0006259 29 6.83E-07biological process DNA conformation change GO:0071103 13 1.19E-06biological process protein-DNA complex subunit organization GO:0071824 11 2.12E-06biological process nucleosome organization GO:0034728 10 2.70E-06biological process cellular macromolecular complex assembly GO:0034622 21 4.66E-06biological process chromatin organization GO:0006325 22 5.64E-06biological process regulation of primary metabolic process GO:0080090 76 6.72E-06biological process regulation of cellular metabolic process GO:0031323 77 8.09E-06biological process cellular macromolecule biosynthetic process GO:0034645 70 1.02E-05biological process cellular biosynthetic process GO:0044249 80 1.11E-05biological process cellular macromolecular complex subunit organization GO:0034621 22 1.28E-05biological process organic substance biosynthetic process GO:1901576 81 1.54E-05biological process gene expression GO:0010467 76 2.22E-05biological process organelle organization GO:0006996 47 2.30E-05biological process regulation of metabolic process GO:0019222 83 2.76E-05biological process macromolecule biosynthetic process GO:0009059 70 3.00E-05biological process chromosome organization GO:0051276 24 3.14E-05biological process biosynthetic process GO:0009058 81 3.19E-05biological process cell cycle GO:0007049 34 3.59E-05molecular function heterocyclic compound binding GO:1901363 95 6.38E-10molecular function organic cyclic compound binding GO:0097159 95 1.21E-09molecular function binding GO:0005488 166 3.56E-09molecular function DNA binding GO:0003677 49 5.93E-08molecular function nucleic acid binding GO:0003676 61 1.37E-07molecular function protein binding GO:0005515 104 1.42E-05molecular function histone binding GO:0042393 8 0.001molecular function ion binding GO:0043167 84 0.0029molecular function hormone receptor binding GO:0051427 8 0.0047molecular function ribonucleotide binding GO:0032553 37 0.0077molecular function nuclear hormone receptor binding GO:0035257 7 0.0091molecular function purine ribonucleotide binding GO:0032555 36 0.0096molecular function enzyme binding GO:0019899 27 0.0096cellular component intracellular part GO:0044424 184 1.63E-15cellular component intracellular GO:0005622 186 2.48E-15cellular component intracellular organelle GO:0043229 168 1.53E-14cellular component organelle GO:0043226 168 1.99E-14cellular component nucleus GO:0005634 111 1.44E-12cellular component intracellular membrane-bounded organelle GO:0043231 149 2.75E-11cellular component membrane-bounded organelle GO:0043227 149 3.42E-11cellular component cell part GO:0044464 194 7.44E-11cellular component cell GO:0005623 194 7.51E-11cellular component chromosome GO:0005694 29 5.93E-10cellular component non-membrane-bounded organelle GO:0043228 67 5.39E-08cellular component intracellular non-membrane-bounded organelle GO:0043232 67 5.39E-08cellular component chromosomal part GO:0044427 24 9.49E-08cellular component intracellular organelle part GO:0044446 85 1.79E-07cellular component organelle part GO:0044422 86 2.71E-07cellular component chromatin GO:0000785 17 5.69E-07cellular component macromolecular complex GO:0032991 69 4.48E-05cellular component nucleosome GO:0000786 7 0.0001cellular component cytoplasm GO:0005737 126 0.0002cellular component nuclear lumen GO:0031981 33 0.0018cellular component nucleoplasm part GO:0044451 21 0.002cellular component membrane-enclosed lumen GO:0031974 36 0.0036cellular component nucleoplasm GO:0005654 22 0.004cellular component protein-DNA complex GO:0032993 7 0.0044cellular component intracellular organelle lumen GO:0070013 35 0.0046cellular component organelle lumen GO:0043233 35 0.0049cellular component nuclear part GO:0044428 37 0.0095cellular component cytoplasmic part GO:0044444 86 0.0095

Supplementary Table 2 | GO term enrichments for Chia-PET detected ineracting genes

Nature Genetics: doi:10.1038/ng.2965

Page 17: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  17  

# Group Element NAST ID (Mm9) siRNA ID Antisense sequence

1 NAST-12.1 AUCCCUGGCCAUCAAUGACAACGUG 2 NAST-12.2 AAACUGGUCUGGUUGACUACACUUG3 NAST-13.1 UCUUCUUCCUCCUCCUCUUUCUCUC 4 NAST-13.2 UAGACUGGGCAGAUGUACUGCUAUA5 NAST-14.1 UGCACAUCCAUAUUGGGCAGAUAUG 6 NAST-14.2 AUUAGUUGAGUUCUAAUCGACGCCA7 NAST-15.1 AAUUCUUCUGGACAGAUCUGUAAGC 8 NAST-15.2 AGUAGUCUAGACUUGCUUUGAACUG9 NAST-16.1 AUGGCAACAAUACCUCACUUCCUGG

10 NAST-16.2 ACAAUAGCCAGGUCACAAUAAGGCC11 NAST-16.3 UCAGCGUGACACACAAUCAGAUCUC12 NAST-16.4 UAAACUAUCCCUUGUCAGCGUGACA13 NAST-17.1 AUCUGCAGACAGCAGGAAGCCAUUG 14 NAST-17.2 UUCAUCUCCUGCCGAUCGAUGAUGG15 NAST-18.1 AAUUAGGGCAGCCUUCAUUUGAACA 16 NAST-18.2 UUAAGAGCACAACAACUGAGCAGCA17 NAST-19.1 AACAACAGUCACUCAAUCACUGAGC 18 NAST-19.2 UUUGUAAUCCCACUGCUGGGCAUGG19 NAST-19.3 UAUGAGACCAGGUAGAUGGCUCAGG20 NAST-19.4 UGAAUUAUCUGACAGUUCUAUAGGC21 NAST-20.1 UAAAUUCUCCAGUGGCGAAUCCUGG 22 NAST-20.2 UUAAGAGCACAACAACUGAGCAGCA23 NAST-21.1 AACAGAUCCCAUCUACAUCCUGCCU 24 NAST-21.2 UUCAAGAUCGCUCUCUUGCCAUCCC25 NAST-22.1 AUACCAGGAUACUCUAGCAGCCACA 26 NAST-22.2 CAAACCCAUACUUUACUGCCUUACC27 NAST-33.1 UUCCCACUAAGCCUGCCCAGCAAGU 28 NAST-33.2 CUGUCAGCUUCGGAACAUCACUCUC29 NAST-37.1 UGAUGUCCCGUUCCAAAGAGAACUC 30 NAST-37.2 UUUAAAUGCCUUCAUGGAGCACCUC31 NAST-38.1 AAGACAGUCUCAUUGGAGAGUCUGA 32 NAST-38.2 UAUGACAGUGACUUGCUUUGUGUUG33 NAST-40.1 AUCUGAGUCUUCUUACCCUGGCUCG 34 NAST-40.2 UAAACUGUAGGCACUAUGCUGGCCA35 NAST-48.1 UCAUUUGUCAACAUGAUUAGCUCCC 36 NAST-48.2 UGGUUUAUACCGCAACAGAAGAGUG37 NAST-49.1 UUCCAAGGCAAACUAACUCUUCUGC 38 NAST-49.2 UGAACUGCAACAAUUCACCAUCCUG39 NAST-51.1 UUACAUCUUAUCCUCACGCUAUCCC 40 NAST-51.2 UUGCUAGUACACCAACUAAGUGUGG41 NAST-65.1 AAGCAGCGGUGACAACAGUGACUUG 42 NAST-65.2 UAGUGCUGAUGCAGAAACAGGUCUG43 NAST-68.1 UUCUUCUUCCUCCUCCUCUGCAUCC 44 NAST-68.2 AGAUCGGGCAGAUUGGGUAGCUACA45 NAST-69.1 AACUGAAAGACACUGGUUUGCUACA 46 NAST-69.2 UUGUUACUAAGACAGCCCUGUGCUU47 NAST-70.1 UAUAACACUACAGUACUCUAUUCCC 48 NAST-70.2 AUGCCAUGCCAAGAACACACGACUC49 NAST-75.1 ACUCAGUGCCCUAAAUUCCCACUGG 50 NAST-75.2 ACUCUGGACAAGGAUCUCUCUCUGG51 NAST-79.1 UUGGCAUUGCGGUUGAACCCAGAGA 52 NAST-79.2 UUGACUUCACCCACCACUGACUUCC53 NAST-79.3 AGAAUUGACAGGAAGGGAGCCCAGG54 NAST-79.4 UUUAUUUCCCACAGCACCUGCCUCC55 NAST-87.1 UUAAUAUGAUUACAUGCUGUGUGCC 56 NAST-87.2 CCACAUGUAACCCUAAACCAACACC57 NAST-91.1 UUCUCUGAAUAGUUCAUCUCUUGCA 58 NAST-91.2 UUUGGGUGCCAAUUUCUGUGGAGCG59 NAST-111.1 AACUUAGCCCAUAUCCUUUCGCACC 60 NAST-111.2 ACAACCACCUGAUUCAGUUCCUAGU61 NAST-115.1 CAUCCAGUAUGGCAGCUGACAUCUG 62 NAST-115.2 AUAUCCUGUAGCUGACCUACAUGUC63 NAST-117.1 UCCAGUCGAGGCUAGAUCUUGAACC 64 NAST-117.2 AGACAUGCAAGCCUAUGCCUGACUC65 NAST-118.1 AAGAUGAGCCAAUACACGAGUGUCG 66 NAST-118.2 UAGCAAAUCAGCAACUGGUGAUCUG67 NAST-118.3 UAUCCCUCUAGCAAAUCAGCAACUG68 NAST-118.4 UUCUCUAUCCCUCUAGCAAAUCAGC69 NAST-119.1 UAUACACUGUGAAAGACCCUCCUCC 70 NAST-119.2 AAGGUUUACUGUUUACUGUGAGCUC71 NAST-121.1 UUAGUUACCUGUGUUCAGGUACUGG 72 NAST-121.2 AAGACAAGCCUGUGACUGAACAGCG73 NAST-125.1 UUUACUUGGCUCACGCUUCCCUAUC 74 NAST-125.2 UGAGCUUACCAACUGUCUGCUUGGG75 NAST-127.1 AGAGGUGUCUGCCUGAAACAGUUCU 76 NAST-127.2 ACACAUUUCUUACUAACCACUUGUC77 NAST-127.3 UCUUCAGGCUCGCUCAUGUAUGUUC78 NAST-127.4 UAAUCAGCCAUUCCUGAUCUUCAGG79 NAST-128.1 AACGAAAGACCAAAGACGCACACUC 80 NAST-128.2 UAAAUAUUAAUCCUGGUGGCCGGGU81 NAST-132.1 UUGUGCAACGCUCAAAGACCCUUCC 82 NAST-132.2 UUAACAGGAGGUCAUUCCUAGGAGG83 NAST-133.1 UUCCAGUGGUCAAUCCUUGCAGAUC 84 NAST-133.2 AUCACUGAGAGAUACCCAUCUGUUG85 NAST-134.1 AUAAGUUGGUAUCAGUGUUGUGGCU 86 NAST-134.2 CAACAUUGUAAGCAUUAAAGGACCA87 NAST-134.3 UGCAAAGCCCUCCUAGCUUCCUCUG88 NAST-134.4 AGAGGCAGAGGUUAGCAUGCAAAGC

Supplementary Table 3 | siRNA sequences and target coordinates

ERVK BGLII chr5_145440048_145440111_+

ERVK BGLII chr2_70644366_70644462_+

ERVK RLTR17 chr3_122959766_122959769_-

ERVK RLTR17 chr1_63186909_63187024_-

ERVK BGLII_B chr12_4726916_4726983_+

ERVK RLTR12B chr14_51558750_51558815_-

ERVK RLTR17 chr8_115222292_115222411_+

MaLR ORR1C1 chr13_43279292_43279383_+

MaLR MLT1K chr4_130185344_130185518_+

MaLR ORR1B1 chr5_108505488_108505506_+

ERVK RLTR17 chr9_22290770_22290863_-

ERVK RLTR12B chr2_160481916_160481918_-

ERVK RLTR17 chr8_6940645_6940745_-

ERVK RMER20B chr14_55667208_55667329_+

MaLR ORR1B1 chr12_12909352_12909439_-

ERVK RLTR11B chr5_136133089_136133193_-

ERVK BGLII_B chr5_121338719_121338755_+

ERVK RLTR17 chr4_154029928_154029937_+

ERVK BGLII chr10_25103480_25103499_-

ERVK RLTR17 chr13_12269096_12269184_-

ERVK RLTR17 chr1_83930309_83930376_-

ERVK RLTR27 chr13_20108500_20108598_+

ERVK RLTR9E chr15_16844720_16844914_-

ERVK MERVK26-int chr5_44205835_44205933_-

ERVK BGLII chr12_64213955_64214117_-

ERVK RLTR25A chr17_56973074_56973222_+

ERVK RLTR26 chr6_82638731_82638746_-

ERVK RMER19A chr9_95421602_95421754_+

MaLR ORR1D2 chr13_81783234_81783315_-

ERVK RLTR20B3 chr2_168831483_168831582_+

ERVK RLTR25B chr3_135210248_135210434_-

ERVK RLTR15 chr8_119197042_119197122_-

ERVK RLTR25A chr14_77919939_77920051_+

MaLR MTC chr19_23168243_23168290_-

ERVK RLTR17 chr19_38400478_38400486_-

ERVK BGLII chr10_25103624_25103679_+

MaLR MTE2b chr15_13062676_13062686_-

MaLR ORR1D1 chr2_21898466_21898648_-

Nature Genetics: doi:10.1038/ng.2965

Page 18: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  18  

89 NAST-138.1 AAGAGCUUCCACUUGGGCUGCUAGA 90 NAST-138.2 UGGUGAGCACAAAGAGAACAUCCUG91 NAST-155.1 AAUUCUUCAUAGUCCUUUACUGCCA 92 NAST-155.2 AAAUCUAAGCCCAUGGUUUCAACCC93 NAST-156.1 AAACUUCAAUGACCCAUUCCCAGGG 94 NAST-156.2 UACUGUGCCCAUGAUAGUCCCUCUG95 NAST-156.3 UUAACACAGGCUCGAAAGCCUACUG96 NAST-156.4 AACUCUAGCAUCCCUAAUUGUGUGG97 NAST-158.1 UAUCUAAACAAGGUGCAAGAAGGUA 98 NAST-158.2 UUAGCAGUACUAUUCAGGGAGGAGC99 NAST-160.1 AAUCCAGGCUACUGAAGCCACAAGA

100 NAST-160.2 AAGACUCGCCAAUCCAGGCUACUGA101 NAST-162.1 UGAAUCUUUCAACUCUCGGAGCCCG 102 NAST-162.2 AGCACUGGCUACCUUAAACUUGAUC103 NAST-164.1 AUAACUAGCUUCUUAUGUCCCUGGA 104 NAST-164.2 UUAUGUCCCUGGAUCUCCUCUUACC105 NAST-173.1 UUCUGCUCCACUUCCCUUCCAUCUC 106 NAST-173.2 AGAUCAUGCAAGAGCUCAGCUGAGC107 NAST-174.1 UUAACCUUACCUUAGUCACUGUCCA 108 NAST-174.2 AGAGGGACCACAACCAAACCAACUC109 NAST-178.1 AACCCAUGCUUUGCCACCAAACCUU 110 NAST-178.2 UACAUCUUAUCCCUCUGCUGUCCCA111 NAST-188.1 AUCUCUUCCCACCUUCUGCUCUACC 112 NAST-188.2 AUAUAUUGUAAUCCUAGCCAGGGUA113 NAST-194.1 UGGAUUCGCCAUAUAAGCCAGGAGA 114 NAST-194.2 AAAGGAAGUUCCCAGCUGGUAGCCU115 NAST-194.3 AUCCCUGGGUGACAAGGUUUAACUU116 NAST-194.4 AUAGAUGGCCUGAGGUAAUGUGCCC117 NAST-195.1 AUUCCUGCAAACUCAUCCUUGGAGC 118 NAST-195.2 UGGGAUUCAAGCAUCCUUACUUUGG119 NAST-204.1 AACAACACCACGACCAAAGCAAGUU 120 NAST-204.2 UUGCCUAUGAGACAGCUUGGGCUGG121 NAST-204.3 UCUUGAUCCUUCUUAACACGGAAGG122 NAST-204.4 UGCAUGUGCACACUUUGUCUCCCUC123 NAST-209.1 UAACCAACGCAACUCUAAUAAAGGA 124 NAST-209.2 AAACCGAUCACUUCCUUAAAUCUUG125 NAST-210.1 UGUAUCUCUUACCAGCUUCACCUUC 126 NAST-210.2 UAAAUGGUCCCACUGUGUGUUCAGA127 NAST-234.1 UUCAGCAUCCUCUCCCUCUUCCAUU 128 NAST-234.2 UUCUGACUUAAUAAACAGCACAGGG129 NAST-236.1 ACAUCUUCACAUGAAGGCAACUAGG 130 NAST-236.2 AGUCCAUCAUCAUCUAGGUGGGAGC131 NAST(II)-22.1 UUCUUAAGAUGCAGACAAUCUCUGG132 NAST(II)-22.2 UUGACAAUGGAGCAGAGGUGUCCUA133 NAST(II)-22.3 AGAGAGAGAACACCCUUUCUUGCUC134 NAST(II)-22.4 CAGAGAGUACAUUUCAUUUGGCCUG135 NAST(II)-24.1 UGCAGUUCCAGCUCAGACCUUACCA136 NAST(II)-24.2 AACACUUUGCCACUAGCUAAACGCA137 NAST(II)-25.1 UUGUAGCUCAGGCCGAUCUGUCAGU138 NAST(II)-25.2 AGGAGAAUUAGUCUGAUGAAUCCUC139 NAST(II)-28.1 AUGUCUUACCCUUAUUCUGUCCCUG140 NAST(II)-28.2 ACAACCUGCAUGUUCCCAAGCUUGA141 NAST(II)-29.1 AAAGACAGCCGAGCCUUAUCCUCUG142 NAST(II)-29.2 AUUCUGUGAAGCAACUCCUGCUGUG143 NAST(II)-30.1 UACUUCAUCGUACCGCUCCAUGCUG144 NAST(II)-30.2 AAUGGAGGUGAGUUCAAUCAACUUC145 NAST(II)-35.1 AUAAUUAGAAUCCAAAUGCCUCCCU146 NAST(II)-35.2 UACACUUAACCUCAGGGAUGACUCG147 NAST(II)-38.1 AUCACUGAGAGAUACCCAUCUGUUG148 NAST(II)-38.2 UUAUGAUUUAGAAGCAGUCAUUCCG149 NAST(II)-42.1 AUCUGAGUCUUCUUACCCUGGCUCG150 NAST(II)-42.2 UAAACUGUAGGCACUAUGCUGGCCA151 NAST(II)-45.1 AAUGCAGGCUUAAUAGGCAAGAGGG152 NAST(II)-45.2 UACUCUGGCAAAUUAAACCCACUUC153 NAST(II)-47.1 AUGAGAUCCCUGCUACUUGCGGUUU154 NAST(II)-47.2 AUUAAUAUGAUUACAAGCUGCGCGC155 NAST(II)-54.1 AAUCGCUGCGAGAAGAUACCAGACA156 NAST(II)-54.2 UUACAAACCCACGUUCUGCAGUGGU157 NAST(II)-58.1 UUUGCCGCUGUACCUCUCUCUCUUG158 NAST(II)-58.2 AGAGGAUCACUGCCAAACGUCAACA159 NAST(II)-59.1 UCCACAUGGUUAGUUAACUCACCUC160 NAST(II)-59.2 UUCCCUGACUAGUUCGUAACUGAUU161 NAST(II)-62.1 UGUGAUUGAUACCUUCUAAGAGUUC162 NAST(II)-62.2 UUGUGUCUGUGUUUGCUCUUAGGUG163 NAST(II)-67.1 AUUACAAUCUAUGACCCUAGAUUGG164 NAST(II)-67.2 UAAGUGUGCCUGUUGCUGGAAUUUC165 NAST(II)-68.1 AACACAUGAGAGCCUCUCUUUCUUC166 NAST(II)-68.2 UGUUCCAGGAGUGAGACAAGCACGA167 NAST(II)-69.1 UCACCAUUGCCAAUGUCUCCAGAUC168 NAST(II)-69.2 AACUUAGUCCCUAUCCUAUCGCACC169 NAST(II)-70.1 AAACUUGGAGGGAAAUUGACCUCUG170 NAST(II)-70.2 AAUACAGUAGGAUACCUGAAGCAUC171 NAST(II)-73.1 UUCACUAGAACUCAGUUGCGUCACC172 NAST(II)-73.2 AGCUAGUGGACUCUAGCUUCCCAUG173 NAST(II)-74.1 UUCAGUAACAAUCUGGCCUGGGUGA174 NAST(II)-74.2 UUUGCUUGCUAACCCUGAUACUCUU

ERVK RLTR11A chr11_22521270_22521379_+

ERVK RLTR17 chr7_24943592_24943596_+

ERVK BGLII chr16_64796637_64796724_+

ERVK RLTR17 chr11_22507713_22507832_-

ERVK BGLII_Mus chr13_21028194_21028297_+

ERVK RLTR17 chr11_22507970_22508040_+

ERVK RLTR9E chr11_12905216_12905389_-

MaLR MTC chr12_87831168_87831176_-

MaLR ORR1C2 chr13_64069994_64070072_-

MaLR ORR1C1 chr14_76915718_76915837_+

MaLR ORR1D2 chr5_104108177_104108280_+

ERVK RLTR11B chr5_110866558_110866646_-

ERVK RMER19A chr3_88374326_88374462_-

ERVK RLTR25A chr11_62325183_62325250_+

MaLR ORR1C1 chr14_76915718_76915837_+

MaLR ORR1A4 chr1_133042816_133042905_+

MaLR ORR1A2 chr1_134175023_134175040_-

ERVK BGLII chr5_31791699_31791828_+

ERVK BGLII chr16_57174656_57174741_-

ERVK RLTR17 chr16_32136120_32136180_-

ERVK RLTR17 chr16_56116331_56116461_-

ERVK RLTR9E chr15_100100465_100100657_+

ERVK RLTRETN_Mm chr15_5994600_5994710_-

ERVK RLTR17 chr19_38400478_38400486_-

ERVK RLTR12B chr2_160481955_160482012_-

ERVK RLTR44B chr16_57280935_57281038_+

MaLR ORR1E chr17_49103177_49103271_-

ERVK RLTR17 chr6_124424807_124424943_+

ERVK RLTR17 chr6_47140432_47140493_-

ERVK RLTR31B_Mm chr3_41332356_41332414_-

ERVK BGLII_A chr4_133512521_133512590_+

ERVK BGLII_B chr9_100869933_100870074_-

ERVK RLTR11A2 chr9_118315217_118315319_-

ERVK RLTR15 chr6_48015908_48015974_+

ERVK RLTR17 chr8_110206085_110206190_-

ERVK RLTR17 chrX_161441027_161441146_-

ERVK RLTR11A chrX_81688390_81688488_-

ERVK RLTR9D chr9_18898054_18898174_+

ERVK RMER16 chr9_40047436_40047535_-

Nature Genetics: doi:10.1038/ng.2965

Page 19: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  19  

Supplementary Table 4 | Control siRNA sequences# Control siRNA ID Antisense sequence

1 Stem transcription factor Nanog AUUCGAUGCUUCCUCAGAACUAGGC2 Stem transcription factor Sox2 UUAUCCUUCUUCAUGAGCGUCUUGG3 MER50B UUGGGCUGGCUGCUAAAUUAUACUU4 MER66A AUCACCAUUGCCAUGUAAACUAACC5 RLTR43A AGUGACAAUAUAAGCAUAAGCAAGA6 LTR86A2 UGCCUCUGGCUGUGUAGAGUUCUGG7 LTR86B2 GCAGAUGCAGCAAACGAAUAUAAUC8 Non expressed LINE elements L1M2a1 CAGAGUCCAGUGAAGGGAUCUCUCU9 Non expressed SINE elements MIRm AGGGUGUCUGCCCUCAGAGCUUACA

10 Sdr16c6 AAGUUUUCUGGGUUGCUUUUCUGCU11 Wfdc6a AAAGUGAAGCAGAGACCGUGGACGA

Non-expressed gene with LTR associated promoter

Non expressed LTR elements

Nature Genetics: doi:10.1038/ng.2965

Page 20: Supplementary material to: Deep transcriptome profiling of ... · ! 1! Supplementary material to: Deep transcriptome profiling of mammalian stem cells supports a regulatory role for

  20  

Supplementary Table 5 | Primers used in qRT-PCR assaysGene Sequence

Actc1 Forward TCTCTTCCAGCCCTCTTTCA(NM_009608) Reverse ATGGTGGTGCCTCCAGATAG

Esrrb Forward CAGGCAAGGATGACAGACG(NM_011934) Reverse GAGACAGCACGAAGGACTGC

Gapdh Forward AACTTTGGGATTGTGGAAGG(NM_008084) Reverse ACACATTGGGGGTAGGAACA

Gata4 Forward TCTCACTATGGGCACAGCAG(NM_008092) Reverse GCGATGTCTGAGTGACAGGA

Gria1 Forward ACCACTACATCCTCGCCAAC(NM_008165) Reverse TCACTTGTCCTCCACTGCTG

Nanog Forward AAGTACCTCAGCCTCC(NM_028016) Reverse GTGCTGAGCCCTTCTG

Neat1 Forward GGGGCAGTGTCCTAACTTGA(NR_003513) Reverse CCCACTGCCTGTCCTCTATG

Oct-4 Forward AGTTTGCCAAGCTGCTGAAG(NM_013633) Reverse TCTTAAGGCTGAGCTGCAAGG

Rex1 Forward ACGAGTGGCAGTTTCTTCTTGGGA(NM_009556) Reverse TATGACTCACTTCCAGGGGGCACT

Sox2 Forward TGAACGCCTTCATGGTATGG(NM_011443) Reverse TTGTGCATCTTGGGGTTCTC

Forward AGGCAGTGAGTACCTCAAGGAReverse GCCAAAGTCCAGCTACAACAG

Forward GGCTCATCTGGGTTCAAGAGReverse GGACTGGAGTTCCCAAAACA

Forward CTATGACCCCAGACAGTGGReverse ACATGATGCAAGATGAGCCA

Forward TGGGAATGGGTCATTGAAGTReverse GCACGCAGCTGGAGAAGTAG

Forward TTGTCTGGCTGCCTAGAGGTReverse CCCACCCAGCAGAGGTAGTA

Forward GTTCGGTTGGCAGAAGCTATReverse CCAACACCGGCATAAAGAAT

Fire-Fly

NAST#156

NAST#16

NAST#19

NAST#118

NAST#160

Nature Genetics: doi:10.1038/ng.2965