supplementary materials for...mar 02, 2016  · other supplementary material for this manuscript...

24
www.sciencemag.org/content/351/6277/1083/suppl/DC1 Supplementary Materials for Regulatory evolution of innate immunity through co-option of endogenous retroviruses Edward B. Chuong, Nels C. Elde,* Cédric Feschotte* *Corresponding author. E-mail: [email protected] (N.C.E.); [email protected] (C.F.) Published 4 March 2016, Science 351, 1083 (2016) DOI: 10.1126/science.aad5497 This PDF file includes: Materials and Methods Figs. S1 to S11 Captions for tables S1 to S6 References Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format)

Upload: others

Post on 11-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

www.sciencemag.org/content/351/6277/1083/suppl/DC1

Supplementary Materials for

Regulatory evolution of innate immunity through co-option of endogenous

retroviruses

Edward B. Chuong, Nels C. Elde,* Cédric Feschotte*

*Corresponding author. E-mail: [email protected] (N.C.E.); [email protected] (C.F.)

Published 4 March 2016, Science 351, 1083 (2016) DOI: 10.1126/science.aad5497

This PDF file includes: Materials and Methods

Figs. S1 to S11

Captions for tables S1 to S6

References

Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format)

Page 2: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

2

Materials and Methods

Dataset GEO accession numbers

The genomic data analyzed in this study were obtained from publicly available datasets. ChIP-

Seq datasets were obtained from GSE31477 (IRF1 and STAT1 for IFNG-stimulated K562 cells,

STAT1 for IFNG-stimulated HeLa cells), GSE43036 (IRF1, STAT1, and H3K27ac for IFNG-

stimulated CD14+ primary macrophages), and GSE33913 (STAT1 for IFNG-stimulated mouse

primary bone marrow-derived macrophages). RNA-Seq datasets were obtained from GSE66809.

Individual sample library accession codes are available in Table S2 (RNA-Seq) and S5 (ChIP-

Seq).

ChIP-Seq analysis

Illumina FASTQ reads were downloaded from the NCBI Gene Expression Omnibus (GEO) and

aligned to the human (hg19) or mouse (mm9) genomes using BWA SW v0.7.5a (35). Reads

were filtered to remove low-quality reads and reads mapping to multiple locations (alignment

quality score > 10). Alignments corresponding to replicate samples were merged, and ChIP-Seq

peaks were called at an FDR < 0.05 using MACS2 v2.1.0 (36), with all ChIP libraries

normalized against respective input control libraries. Normalized profiles corresponding to read

coverage per 1 million reads were used for heatmap and metaprofile visualization, which were

generated using the deepTools v1.5.8.1 (37). The CRG 36 bp alignability track was used to

inspect genome sequence uniqueness. Complete peak datasets (hg19 or mm9) are available in

Table S5.

Transposable element analysis

All TE analysis was performed using the Repeatmasker annotations downloaded from

http://www.repeatmasker.org (Repeat Library 20140131). Nucleotide sequence alignments were

performed using MUSCLE v3.8.31(38).

Identifying overrepresented TE families. For each set of ChIP-Seq peak summits, the expected

representation of each TE family was determined by shuffling peaks across the genome. To

account for non-random TE integration biases, peaks were shuffled while maintaining the same

overall distribution of regions relative to their distances to host genes (directly overlapping a

gene transcriptional start site, exon, intron, 10-kb gene proximal region, 10-100 kb distal region

or >100 kb intergenic region). Each dataset was shuffled 10,000 times, and the mean number of

overlaps from these shuffled datasets was used to determine the expected fraction of TE overlaps

in a binomial test. Multiple testing correction was performed by multiplying the P value by the

number of TE families tested (N=1,172). TE families that were enriched with a stringent P < 10-5

were further filtered to retain only families where at least 25 copies were bound and the number

of bound copies exceeded a 3-fold enrichment over the background expectation.

Enrichment near IFNG-stimulated genes. Correlation analysis between bound ERVs and ISGs

was performed using CD14+ data because both ChIP-Seq and RNA-Seq were available for the

same cell type and conditions. Repeatmasker-annotated ERVs that overlapped either STAT1 or

IRF1 ChIP-Seq peak summits in CD14+ macrophages were merged to form a set of 7,098

STAT1/IRF1 bound ERVs (listed in Table S2). A set of 2,000 ISGs were detected using a

Page 3: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

3

matched dataset (Table S2, see RNA-Seq section below), and the absolute distance to the nearest

ISG was determined for all 7,098 ERVs. These distances were grouped by 10 kb bin sizes in Fig

1B. The expected background was determined by randomly sampling an equal number (7,098) of

the remaining 720,997 annotated ERVs that were not bound by either TF, and determining the

number of elements located within each 10 kb bin relative to the nearest ISG transcriptional start

site. Sampling was repeated 10,000 times and the mean number of elements was used as the

expected value for comparison to the observed count of STAT1/IRF1-bound ERVs. Statistical

significance was determined for the first 10 kb bin, by binomial test. The analysis was repeated

using several gene sets: 1) 1,733 genes significantly downregulated by IFNG based on the same

RNA-Seq dataset, 2) 2,000 randomly selected genes that were expressed in the same dataset but

not differentially regulated, and 3) 2,000 randomly selected genes from the human genome (Fig

S2). Functional enrichment based on proximal gene annotations was determined using GREAT

v3.0 default enrichment settings (http://great.stanford.edu) (22).

TE motif analysis. Binding motif position-weight matrices for STAT1 (MA0137.3 for Gamma

Activated Site/GAS motif, MA0517.1 for the Interferon Stimulated Response Element/ISRE

motif) were obtained from the JASPAR CORE database (39). Motif occurrences within TE

consensus sequences or the human genome were identified using FIMO v4.10.0 (40) using a p-

value cutoff of 1x10-4

. For the heatmap visualization of motif presence in Fig 1D, HeLa-bound

copies were used because they represented the largest dataset, only repeat instances that were

>50% intact by length were retained, and repeat 5’ start locations were recalculated based on

Repeatmasker annotations.

Repeat age estimation. Species divergence times were based on (34). Repeat ages were estimated

by dividing the percent divergence of extant copies from the consensus sequence by the species

neutral substitution rate. Substitution rates (mutations/yr) used were as follows: anthropoidea

2.2x10-9

, carnivora 3.8x10-9

, artiodactyls 3.0x10-9

from (41); lemuriformes, 3.0x10-9

,

vespertillionidae 2.7x10-9

from (42). Evolutionary lineage placements based on repeat

divergence times were independently verified by analysis of presence or absence of MER41-like

relatives, using a script by A. Kapusta (Feschotte lab, https://github.com/4ureliek/TEorthology).

Briefly, for all mammalian genomes in which extant MER41-like insertions were annotated by

Repeatmasker, all insertions not nested in another (masked) repeat were extracted including 5’

and 3’ flanking 100 nt of flanking sequences, and queried against 23 mammalian genomes

representing all major eutherian taxa using BLASTN with an E cutoff of <1x10-50

. Hits are

filtered based on the presence of both 5’ and 3’ flanking sequence, which is used as a proxy to

determine if matches in other species represented syntenic/orthologous elements. Both the total

number and number of potentially orthologous matches are recorded (summary output in Table

S4). At E<1x10-50

, lineage-specific MER41 elements do not align to any other family (e.g.

primate-specific MER41 elements do not align to MER41-like elements of any other lineage).

Lineages with 0 hits across all contained species (e.g. rodents, afrotheria) are inferred to not

possess MER41-like elements.

RNA-Seq analysis

Raw FASTQ files corresponding to mock treated and IFNG-treated (2 biological replicates each)

were aligned to the human genome (hg19) using the spliced junction mapper HISAT v0.1.6 (43),

and filtered to remove reads that mapped to multiple locations. Transcript assembly, including

Page 4: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

4

assembly of unannotated genes, was performed using StringTie v1.0.4 (44). Differential

expression analysis of both protein-coding (RefSeq) and annotated/de novo non-coding

transcripts was performed using Cuffdiff v2.2.1 (45). Significantly upregulated loci at the gene

level (i.e. merging all isoforms per gene) with an FDR < 0.05 were considered ISGs. Loci were

collapsed to their transcriptional start sites to determine relative distance from ERVs.

Note: For the association analysis between STAT1/IRF1-bound ERVs and ISGs, the binding site

and expression datasets were generated in two different studies (ChIP-Seq (17), RNA-Seq (21)).

These studies were performed by the same laboratory, and cell culture conditions and treatments

were consistent between the studies (human donor-derived CD14 macrophages cultured using

the same conditions, then either mock-treated or treated with 100 U/ml IFNG for 24 hrs). In

experiment used for RNA-Seq, both unstimulated and IFNG-stimulated samples were further

treated using 10 ng/ml Pam3CSK4 for 4 hours before harvesting, which activates the TLR2-

pathway. Because both mock and IFNG-stimulated cells were treated, genes identified from this

dataset still represent CD14+ macrophage ISGs, though a fraction of these ISGs may be

dependent on TLR2 activation, which was not performed in the ChIP-Seq study.

Cell culture and treatments Human HeLa F2 cells and human primary fibroblast cells were gifts from A. Geballe (Fred

Hutchinson Cancer Research Center, Seattle, WA). Other primate primary fibroblasts were

obtained from the Coriell Cell Repositories (Camden, NJ): chimpanzee, PR00226; rhesus

macaque, AG06252; white-fronted marmoset, PR00789. All cells were maintained in High

Glucose DMEM F12 media (HyClone, Logan, UT) with 100 ug/ml penicillin-streptomycin, 2

mM L-glutamine, and 10% heat-inactivated FBS. All IFNG treatments were performed using

1000 U/ml recombinant human IFNG (cat #11500-2, PBL assay science, Piscataway, NJ) for 24

hrs. Human recombinant IFNG was used to stimulate primary fibroblasts from other primates.

All transfections were performed using FuGENE HD (Promega, Madison, WI) following

manufacturer’s instructions.

Luciferase reporter assays

LTR enhancer sequences were either cloned from genomic DNA (MER41.AIM2) or synthesized

as Gene Fragments (Integrated DNA Technologies, Coralville, IA) and cloned into pGLuc Mini-

TK 2 Gaussia enhancer reporter plasmids upstream of the minimal promoter. Reporter constructs

were co-transfected with a pTK-CLuc constitutive Cypridina vector (ratio of 100 ng Gaussia

plasmid: 5 ng Cypridina control plasmid) into HeLa cells. After 24 hrs, media was replaced and

cells were either mock treated or stimulated with 1000 U/ml IFNG. 24 hrs following stimulation,

cell culture supernatants were assayed for secreted reporter activity using the

BioLux Gaussia and Cypridina Luciferase Assay systems (New England Biolabs, Ipswich, MA),

and all reported values were normalized to Cypridina co-transfection controls. All experiments

were performed with 3 biological replicates per condition in a 96-well plate format. Sequences

are available in Table S2. Results are representative of at least 3 independent experiments.

Barplots are presented as mean +/- s.d.

Gene expression analysis Confluent cells were either mock treated or stimulated with 1000 U/ml IFNG for 24 hrs, then

total RNA was extracted from cells using the RNA MiniPrep kit (Zymo, Irvine, CA). 100 ng

Page 5: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

5

RNA was prepared for quantitative real-time PCR reactions using the Power SYBR® Green

RNA-to-CT™ 1-Step Kit (Life Technologies) and run on the 7900HT Fast Real-Time PCR

System (Applied Biosystems, Waltham, MA). Transcript CT Values were normalized against

transcript levels of CTCF. Primers were all designed against primate-conserved sequences in the

coding sequences of each gene (Table S6). Experiments were performed with 2 biological

replicates per condition in a 12-well plate format. Results are representative of at least 3

independent experiments. Barplots are presented as mean +/- s.d.

Statistical analyses

Statistical significance for both qPCR and luciferase reporter assays was assessed using a two-

tailed unpaired Student’s t-test with a threshold of p < 0.05.

Virus infections

Cells were seeded at 5x105 cells per well of a 6-well plate. After 24 hrs, cells were transfected

either with an empty control vector (pEGFP-N1) or a pcDNA3-hAIM2-T7 AIM2 rescue plasmid

(gift from Emad Alnemri, Addgene #51538). 24 hrs following transfection, cells were washed

twice with PBS then infected with vaccinia virus (MOI = 5.0) in 1 ml serum free media. Both

cells and supernatants were harvested for Western blot analysis 24 hrs after infection.

Generation and analysis of CRISPR-Cas9 mutants

For each MER41 element (associated with AIM2, APOL1, IFI6, SECTM1), two gRNA sequences

were designed to generate a specific internal deletion encompassing the predicted STAT1

binding sites. All gRNA sequences, including the 20 bp seed and 3 bp PAM (NGG) sequence,

were verified to be unique targets in the human genome using BLAT against the hg19 genome

assembly. gRNA oligos were ordered from IDT and cloned into pSpCas9(BB)-2A-Puro vectors

(gift from Feng Zhang, Addgene #62988) following a published protocol (46). HeLa cells were

co-transfected with both gRNA constructs and placed under puromycin selection for 4 days.

Clonal lines were isolated using limited dilution cloning, and 100-200 clones were screened for

homozygous deletions by PCR using both flanking and internal primer pairs at the expected

deletion site (Fig S8). To determine deletion breakpoint sequences, PCR products flanking each

breakpoint were TA-cloned into PCR2.1 TOPO vectors (Life Technologies, Carlsbad, CA) and

transformed into DH5a cells, and at least 30 individual colonies were sequenced using Sanger

sequencing (Genewiz, South Plainfield, NJ).

Western blot analysis

Cell lysates were prepared with RIPA buffer and resuspended in 2x Laemelli buffer

supplemented with 5% beta-mercaptoethanol. Cell culture supernatants were concentrated using

methanol-chloroform precipitation and resuspended in 2x Laemelli buffer. Antibodies used were

as follows: AIM2 (cat #D5X7K, Cell Signaling Technologies, Danvers, MA), Caspase-1 (cat

#sc-515, Santa Cruz Biotechnology, Santa Cruz, CA), and Actin (cat #612656, BD Biosciences,

Franklin Lakes, NJ). Results are representative of at least 3 independent experiments.

Page 6: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

6

Fig. S1.

A) Scatter plots of TE families (each dot) plotted against the log-normal observed versus

expected number of intersections for each ChIP-Seq dataset. Significantly enriched families are

shown in blue. See Table S1 for repeat-annotated analysis. B) Fraction of binding sites contained

within each major class of TE.

Page 7: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

7

Fig. S2.

Analysis shown in Fig 1B (main text) repeated using other sets of genes for comparison. Each

graph depicts frequency distributions of ERV absolute distances to the nearest gene within the

particular gene set. 7,098 CD14+ STAT1/IRF1-bound ERVs are compared to a normalized

background expectation based on all other unbound ERVs (see Materials and Methods section).

Page 8: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

8

Fig. S3.

IFNG-inducible ERV binding sites are enriched near immunity genes. 7,098 ERVs bound by either IRF1 or STAT1

in IFNG-stimulated CD14+ macrophages were analyzed using the GREAT tool with default settings

(http://great.stanford.edu). All displayed categories were significant by both binomial and hypergeometric tests. A)

genes with characterized roles in biological processes. B) genes with known knockout phenotypes in mouse.

Page 9: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

9

Fig. S4.

Dispersal of IFNG-inducible STAT1 binding sites by MER41 subfamilies. A) Violin plots,

scaled to family size (in parenthesis), depicting estimated age distribution of MER41 subfamilies

(Methods). Boxplot range indicates 25th and 75th percentiles and inner circle depicts median

value. B) Bar plots for each subfamily comparing observed (black) and expected (grey)

intersections with each ChIP-Seq dataset analyzed. Expected values are plotted as the mean ±

s.d. of intersections from 10,000 shuffled datasets (see Methods). * Bonferroni P < 0.005,

Binomial test. C) Venn diagram depicting cell-type overlap of individual MER41B elements

bound by STAT1.

Page 10: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

10

Fig. S5.

Pairwise alignment of the MER41A and MER41B LTR consensus sequences. GAS STAT1

binding sites are highlighted in gray.

Page 11: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

11

Fig. S6.

Schematic depicting CRISPR-Cas9 strategy used to delete MER41.AIM2. In our final

MER41.AIM2 deletion mutant, four distinct breakpoints were recovered via Sanger sequencing

(HeLa cells harbor four copies of the AIM2 locus (47)), indicating a homozygous deletion.

Arrows indicate gRNA orientation and PAM (NGG) sequences are underlined. Sequences

masked as MER41 are in blue. Arrows indicate gRNA orientation and PAM (NGG) sequences

are underlined. Sequences masked as MER41 are in blue.

AAAAGTGTTGCC--GCGGCCAGTTTCTTCTGAAAAGTGTTGCCA-GCGGCCAGTTTCTTCTGAAAAGTGTTGCTA-GCGGCCAGTTTCTTCTGAAAAGTGTTGCCAGCGGGCCAGTTTCTTCTG

GGTATCTAAAAGTGTTGCCAGTAGGGCA..AACCAGTTGCGGCCAGTTTCTTCTGTGT

MER41.AIM2

NHEJ breakpoints of ΔMER41.AIM2

gRNA 1 gRNA 2

Page 12: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

12

Fig. S7.

MER41.AIM2 is a primate-specific IFNG-inducible enhancer of AIM2. A) Sequence alignment

of MER41.AIM2 sequences tested in luciferase reporter assays, encompassing the tandem GAS

STAT1 binding motifs. B) qPCR of AIM2 primate orthologs in response to IFNG stimulation,

performed in a panel of primate primary fibroblasts. * p < 0.05, Student’s t-test.

Page 13: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

13

Fig. S8.

Genome browser schematic of primers and guide RNAs used for CRISPR-Cas9 genome editing

and genotyping.

200 bases

5’ flank

gRNA 1

Internal fwd

gRNA 2

chr1:159046776-159047514

3’ flank

MER41E.AIM2 (-)

CRISPR/Cas9 gRNAs

Genotyping primers

GAS motifs

HeLa STAT1

36 bp mappability

35

200 bases

chr17:80,296,651-80,297,502MER41G.SECTM1 (+)

CRISPR/Cas9 gRNAs

Genotyping primers

GAS motifs

HeLa STAT1

36 bp mappability

gRNA 1 gRNA 2

5’ flank Internal fwd 3’ flank

40

500 bases

chr22:36,654,210-36,655,398MER41G.APOL1 (-)

CRISPR/Cas9 gRNAs

Genotyping primers

GAS motifs

HeLa STAT1

36 bp mappability

5’ flank Internal fwd 3’ flank

gRNA 1 gRNA 2

500 bases

chr1:28016115-28017789 MER41B.IFI6 (-)

CRISPR/Cas9 gRNAs

Genotyping primers

GAS motifs

HeLa STAT1

36 bp mappability

5’ flank Internal fwd 3’ flank

gRNA 1 gRNA 2

20

5

Page 14: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

14

Fig. S9.

PCR validation on wild-type or mutant genomic DNA, using primers either flanking (A) or

internal to (B) the expected deletion. Primer locations are depicted in Fig S7 and Table S6.

Page 15: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

15

Fig. S10.

MER41-like ERVs independently amplified in other lineages. Alignment of consensus sequences

from selected MER41-like relatives that exhibit conservation of the GAS (TTCNNNGAA)

STAT1 binding motifs. MER41_CF (dog Repeatmasker annotation) is present in all carnivores,

MER41_BT (cow) in artiodactyls, MER41B_Mim (mouse lemur) in lemuriformes, MER41B

(human) in anthropoid primates, and MER41A_ML (little brown bat) in vesper bats.

Page 16: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

16

Fig. S11.

A) Genome browser view of an RLTR30B element inducibly bound by STAT1 in response to

either IFNB or IFNG in mouse BMM. cIAP2 is an inhibitor of apoptosis with reported roles in

innate immunity (48), and has previously been proposed to be regulated by retrotransposons

(49). B) GREAT analysis of 3,336 ERVs bound by STAT1 in IFNG-stimulated mouse

macrophages.

Page 17: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

17

Supplementary Tables

Additional Data table S1 (separate file)

TE enrichment analyses for human and mouse ChIP-Seq datasets.

Additional Data table S2 (separate file)

Analysis of STAT1/IRF1-bound ERVs and IFNG-stimulated genes in primary CD14+

macrophages.

Additional Data table S3 (separate file)

List of MER41 and RLTR30B repeats bound by TFs and their closest gene.

Additional Data table S4 (separate file)

BLAST analysis of presence/absence of MER41-like elements across mammalian species

representative of each major eutherian lineage.

Additional Data table S5 (separate file)

Uniformly analyzed ChIP-Seq peaks used in this study.

Additional Data table S6 (separate file)

Sequences used in this study (CRISPR-Cas9 gRNAs, qPCR and genotyping primers, synthesized

LTR promoters).

Page 18: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

References

1. R. J. Britten, E. H. Davidson, Gene regulation for higher cells: A theory. Science 165, 349–

357 (1969). Medline doi:10.1126/science.165.3891.349

2. B. McClintock, The origin and behavior of mutable loci in maize. Proc. Natl. Acad. Sci.

U.S.A. 36, 344–355 (1950). Medline doi:10.1073/pnas.36.6.344

3. C. Feschotte, Transposable elements and the evolution of regulatory networks. Nat. Rev.

Genet. 9, 397–405 (2008). Medline doi:10.1038/nrg2337

4. R. Rebollo, M. T. Romanish, D. L. Mager, Transposable elements: An abundant and natural

source of regulatory sequences for host genes. Annu. Rev. Genet. 46, 21–42 (2012).

Medline doi:10.1146/annurev-genet-110711-155621

5. C. B. Lowe, G. Bejerano, D. Haussler, Thousands of human mobile element fragments

undergo strong purifying selection near developmental genes. Proc. Natl. Acad. Sci.

U.S.A. 104, 8005–8010 (2007). Medline doi:10.1073/pnas.0611223104

6. T. Wang, J. Zeng, C. B. Lowe, R. G. Sellers, S. R. Salama, M. Yang, S. M. Burgess, R. K.

Brachmann, D. Haussler, Species-specific endogenous retroviruses shape the

transcriptional network of the human tumor suppressor protein p53. Proc. Natl. Acad.

Sci. U.S.A. 104, 18613–18618 (2007). Medline doi:10.1073/pnas.0703637104

7. G. Kunarso, N. Y. Chia, J. Jeyakani, C. Hwang, X. Lu, Y. S. Chan, H. H. Ng, G. Bourque,

Transposable elements have rewired the core regulatory network of human embryonic

stem cells. Nat. Genet. 42, 631–634 (2010). Medline doi:10.1038/ng.600

8. D. Schmidt, P. C. Schwalie, M. D. Wilson, B. Ballester, A. Gonçalves, C. Kutter, G. D.

Brown, A. Marshall, P. Flicek, D. T. Odom, Waves of retrotransposon expansion remodel

genome organization and CTCF binding in multiple mammalian lineages. Cell 148, 335–

348 (2012). Medline

9. E. B. Chuong, M. A. K. Rumi, M. J. Soares, J. C. Baker, Endogenous retroviruses function as

species-specific enhancer elements in the placenta. Nat. Genet. 45, 325–329 (2013).

Medline doi:10.1038/ng.2553

Page 19: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

10. J. H. Notwell, T. Chung, W. Heavner, G. Bejerano, A family of transposable elements co-

opted into developmental enhancers in the mouse neocortex. Nat. Commun. 6, 6644

(2015). Medline doi:10.1038/ncomms7644

11. P.-É. Jacques, J. Jeyakani, G. Bourque, The majority of primate-specific regulatory

sequences are derived from transposable elements. PLOS Genet. 9, e1003504 (2013).

Medline

12. V. Sundaram, Y. Cheng, Z. Ma, D. Li, X. Xing, P. Edge, M. P. Snyder, T. Wang,

Widespread contribution of transposable elements to the innovation of gene regulatory

networks. Genome Res. 24, 1963–1976 (2014). Medline doi:10.1101/gr.168872.113

13. L. C. Platanias, Mechanisms of type-I- and type-II-interferon-mediated signalling. Nat. Rev.

Immunol. 5, 375–386 (2005). Medline doi:10.1038/nri1604

14. L. B. Barreiro, J. C. Marioni, R. Blekhman, M. Stephens, Y. Gilad, Functional comparison of

innate immune signaling pathways in primates. PLOS Genet. 6, e1001249 (2010).

Medline doi:10.1371/journal.pgen.1001249

15. K. Schroder, K. M. Irvine, M. S. Taylor, N. J. Bokil, K. A. Le Cao, K. A. Masterman, L. I.

Labzin, C. A. Semple, R. Kapetanovic, L. Fairbairn, A. Akalin, G. J. Faulkner, J. K.

Baillie, M. Gongora, C. O. Daub, H. Kawaji, G. J. McLachlan, N. Goldman, S. M.

Grimmond, P. Carninci, H. Suzuki, Y. Hayashizaki, B. Lenhard, D. A. Hume, M. J.

Sweet, Conservation and divergence in Toll-like receptor 4-regulated gene expression in

primary human versus mouse macrophages. Proc. Natl. Acad. Sci. U.S.A. 109, E944–

E953 (2012). Medline doi:10.1073/pnas.1110156109

16. M. B. Gerstein, A. Kundaje, M. Hariharan, S. G. Landt, K. K. Yan, C. Cheng, X. J. Mu, E.

Khurana, J. Rozowsky, R. Alexander, R. Min, P. Alves, A. Abyzov, N. Addleman, N.

Bhardwaj, A. P. Boyle, P. Cayting, A. Charos, D. Z. Chen, Y. Cheng, D. Clarke, C.

Eastman, G. Euskirchen, S. Frietze, Y. Fu, J. Gertz, F. Grubert, A. Harmanci, P. Jain, M.

Kasowski, P. Lacroute, J. Leng, J. Lian, H. Monahan, H. O’Geen, Z. Ouyang, E. C.

Partridge, D. Patacsil, F. Pauli, D. Raha, L. Ramirez, T. E. Reddy, B. Reed, M. Shi, T.

Slifer, J. Wang, L. Wu, X. Yang, K. Y. Yip, G. Zilberman-Schapira, S. Batzoglou, A.

Sidow, P. J. Farnham, R. M. Myers, S. M. Weissman, M. Snyder, Architecture of the

Page 20: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012).

Medline doi:10.1038/nature11245

17. Y. Qiao, E. G. Giannopoulou, C. H. Chan, S. H. Park, S. Gong, J. Chen, X. Hu, O. Elemento,

L. B. Ivashkiv, Synergistic activation of inflammatory cytokine genes by interferon-γ-

induced chromatin remodeling and toll-like receptor signaling. Immunity 39, 454–469

(2013). Medline

18. See supplementary materials on Science Online.

19. C. D. Schmid, P. Bucher, MER41 repeat sequences contain inducible STAT1 binding sites.

PLOS ONE 5, e11425 (2010). Medline doi:10.1371/journal.pone.0011425

20. E. S. Lander, L. M. Linton, B. Birren, C. Nusbaum, M. C. Zody, J. Baldwin, K. Devon, K.

Dewar, M. Doyle, W. FitzHugh, R. Funke, D. Gage, K. Harris, A. Heaford, J. Howland,

L. Kann, J. Lehoczky, R. LeVine, P. McEwan, K. McKernan, J. Meldrim, J. P. Mesirov,

C. Miranda, W. Morris, J. Naylor, C. Raymond, M. Rosetti, R. Santos, A. Sheridan, C.

Sougnez, Y. Stange-Thomann, N. Stojanovic, A. Subramanian, D. Wyman, J. Rogers, J.

Sulston, R. Ainscough, S. Beck, D. Bentley, J. Burton, C. Clee, N. Carter, A. Coulson, R.

Deadman, P. Deloukas, A. Dunham, I. Dunham, R. Durbin, L. French, D. Grafham, S.

Gregory, T. Hubbard, S. Humphray, A. Hunt, M. Jones, C. Lloyd, A. McMurray, L.

Matthews, S. Mercer, S. Milne, J. C. Mullikin, A. Mungall, R. Plumb, M. Ross, R.

Shownkeen, S. Sims, R. H. Waterston, R. K. Wilson, L. W. Hillier, J. D. McPherson, M.

A. Marra, E. R. Mardis, L. A. Fulton, A. T. Chinwalla, K. H. Pepin, W. R. Gish, S. L.

Chissoe, M. C. Wendl, K. D. Delehaunty, T. L. Miner, A. Delehaunty, J. B. Kramer, L.

L. Cook, R. S. Fulton, D. L. Johnson, P. J. Minx, S. W. Clifton, T. Hawkins, E.

Branscomb, P. Predki, P. Richardson, S. Wenning, T. Slezak, N. Doggett, J. F. Cheng, A.

Olsen, S. Lucas, C. Elkin, E. Uberbacher, M. Frazier, R. A. Gibbs, D. M. Muzny, S. E.

Scherer, J. B. Bouck, E. J. Sodergren, K. C. Worley, C. M. Rives, J. H. Gorrell, M. L.

Metzker, S. L. Naylor, R. S. Kucherlapati, D. L. Nelson, G. M. Weinstock, Y. Sakaki, A.

Fujiyama, M. Hattori, T. Yada, A. Toyoda, T. Itoh, C. Kawagoe, H. Watanabe, Y.

Totoki, T. Taylor, J. Weissenbach, R. Heilig, W. Saurin, F. Artiguenave, P. Brottier, T.

Bruls, E. Pelletier, C. Robert, P. Wincker, D. R. Smith, L. Doucette-Stamm, M.

Rubenfield, K. Weinstock, H. M. Lee, J. Dubois, A. Rosenthal, M. Platzer, G. Nyakatura,

Page 21: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

S. Taudien, A. Rump, H. Yang, J. Yu, J. Wang, G. Huang, J. Gu, L. Hood, L. Rowen, A.

Madan, S. Qin, R. W. Davis, N. A. Federspiel, A. P. Abola, M. J. Proctor, R. M. Myers,

J. Schmutz, M. Dickson, J. Grimwood, D. R. Cox, M. V. Olson, R. Kaul, C. Raymond,

N. Shimizu, K. Kawasaki, S. Minoshima, G. A. Evans, M. Athanasiou, R. Schultz, B. A.

Roe, F. Chen, H. Pan, J. Ramser, H. Lehrach, R. Reinhardt, W. R. McCombie, M. de la

Bastide, N. Dedhia, H. Blöcker, K. Hornischer, G. Nordsiek, R. Agarwala, L. Aravind, J.

A. Bailey, A. Bateman, S. Batzoglou, E. Birney, P. Bork, D. G. Brown, C. B. Burge, L.

Cerutti, H. C. Chen, D. Church, M. Clamp, R. R. Copley, T. Doerks, S. R. Eddy, E. E.

Eichler, T. S. Furey, J. Galagan, J. G. Gilbert, C. Harmon, Y. Hayashizaki, D. Haussler,

H. Hermjakob, K. Hokamp, W. Jang, L. S. Johnson, T. A. Jones, S. Kasif, A. Kaspryzk,

S. Kennedy, W. J. Kent, P. Kitts, E. V. Koonin, I. Korf, D. Kulp, D. Lancet, T. M. Lowe,

A. McLysaght, T. Mikkelsen, J. V. Moran, N. Mulder, V. J. Pollara, C. P. Ponting, G.

Schuler, J. Schultz, G. Slater, A. F. Smit, E. Stupka, J. Szustakowski, D. Thierry-Mieg, J.

Thierry-Mieg, L. Wagner, J. Wallis, R. Wheeler, A. Williams, Y. I. Wolf, K. H. Wolfe,

S. P. Yang, R. F. Yeh, F. Collins, M. S. Guyer, J. Peterson, A. Felsenfeld, K. A.

Wetterstrand, A. Patrinos, M. J. Morgan, P. de Jong, J. J. Catanese, K. Osoegawa, H.

Shizuya, S. Choi, Y. J. Chen, J. Szustakowki, Initial sequencing and analysis of the

human genome. Nature 409, 860–921 (2001). Medline doi:10.1038/35057062

21. X. Su, Y. Yu, Y. Zhong, E. G. Giannopoulou, X. Hu, H. Liu, J. R. Cross, G. Rätsch, C. M.

Rice, L. B. Ivashkiv, Interferon-γ regulates cellular metabolism and mRNA translation to

potentiate macrophage activation. Nat. Immunol. 16, 838–849 (2015). Medline

doi:10.1038/ni.3205

22. C. Y. McLean, D. Bristor, M. Hiller, S. L. Clarke, B. T. Schaar, C. B. Lowe, A. M. Wenger,

G. Bejerano, GREAT improves functional interpretation of cis-regulatory regions. Nat.

Biotechnol. 28, 495–501 (2010). Medline doi:10.1038/nbt.1630

23. R. Ostuni, V. Piccolo, I. Barozzi, S. Polletti, A. Termanini, S. Bonifacio, A. Curina, E.

Prosperini, S. Ghisletti, G. Natoli, Latent enhancers activated by stimulation in

differentiated cells. Cell 152, 157–171 (2013). Medline doi:10.1016/j.cell.2012.12.018

24. V. Hornung, A. Ablasser, M. Charrel-Dennis, F. Bauernfeind, G. Horvath, D. R. Caffrey, E.

Latz, K. A. Fitzgerald, AIM2 recognizes cytosolic dsDNA and forms a caspase-1-

Page 22: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

activating inflammasome with ASC. Nature 458, 514–518 (2009). Medline

doi:10.1038/nature07725

25. T. Fernandes-Alnemri, J. W. Yu, C. Juliana, L. Solorzano, S. Kang, J. Wu, P. Datta, M.

McCormick, L. Huang, E. McDermott, L. Eisenlohr, C. P. Landel, E. S. Alnemri, The

AIM2 inflammasome is critical for innate immunity to Francisella tularensis. Nat.

Immunol. 11, 385–393 (2010). Medline doi:10.1038/ni.1859

26. L. Vanhamme, F. Paturiaux-Hanocq, P. Poelvoorde, D. P. Nolan, L. Lins, J. Van Den

Abbeele, A. Pays, P. Tebabi, H. Van Xong, A. Jacquet, N. Moguilevsky, M. Dieu, J. P.

Kane, P. De Baetselier, R. Brasseur, E. Pays, Apolipoprotein L-I is the trypanosome lytic

factor of human serum. Nature 422, 83–87 (2003). Medline doi:10.1038/nature01461

27. K. Meyer, Y. C. Kwon, S. Liu, C. H. Hagedorn, R. B. Ray, R. Ray, Interferon-α inducible

protein 6 impairs EGFR activation by CD81 and inhibits hepatitis C virus infection. Sci.

Rep. 5, 9012 (2015). Medline doi:10.1038/srep09012

28. T. Wang, C. Huang, A. Lopez-Coral, K. A. Slentz-Kesler, M. Xiao, E. J. Wherry, R. E.

Kaufman, K12/SECTM1, an interferon-γ regulated molecule, synergizes with CD28 to

costimulate human T cell proliferation. J. Leukoc. Biol. 91, 449–459 (2012). Medline

doi:10.1189/jlb.1011498

29. M. Lagha, J. P. Bothma, M. Levine, Mechanisms of transcriptional precision in animal

development. Trends Genet. 28, 409–416 (2012). Medline doi:10.1016/j.tig.2012.03.006

30. W. Bao, K. K. Kojima, O. Kohany, Repbase Update, a database of repetitive elements in

eukaryotic genomes. Mob. DNA 6, 11 (2015). Medline doi:10.1186/s13100-015-0041-9

31. S.-L. Ng, B. A. Friedman, S. Schmid, J. Gertz, R. M. Myers, B. R. TenOever, T. Maniatis,

IκB kinase epsilon (IKKε) regulates the balance between type I and type II interferon

responses. Proc. Natl. Acad. Sci. U.S.A. 108, 21170–21175 (2011). Medline

doi:10.1073/pnas.1119137109

32. R. E. Randall, S. Goodbourn, Interferons and viruses: An interplay between induction,

signalling, antiviral responses and virus countermeasures. J. Gen. Virol. 89, 1–47 (2008).

Medline doi:10.1099/vir.0.83391-0

Page 23: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

33. M. Sgarbanti, A. L. Remoli, G. Marsili, B. Ridolfi, A. Borsetti, E. Perrotti, R. Orsatti, R.

Ilari, L. Sernicola, E. Stellacci, B. Ensoli, A. Battistini, IRF-1 is required for full NF-κB

transcriptional activity at the human immunodeficiency virus type 1 long terminal repeat

enhancer. J. Virol. 82, 3632–3641 (2008). Medline doi:10.1128/JVI.00599-07

34. R. W. Meredith, J. E. Janečka, J. Gatesy, O. A. Ryder, C. A. Fisher, E. C. Teeling, A.

Goodbla, E. Eizirik, T. L. Simão, T. Stadler, D. L. Rabosky, R. L. Honeycutt, J. J. Flynn,

C. M. Ingram, C. Steiner, T. L. Williams, T. J. Robinson, A. Burk-Herrick, M.

Westerman, N. A. Ayoub, M. S. Springer, W. J. Murphy, Impacts of the Cretaceous

Terrestrial Revolution and KPg extinction on mammal diversification. Science 334, 521–

524 (2011). Medline doi:10.1126/science.1211028

35. H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform.

Bioinformatics 25, 1754–1760 (2009). Medline doi:10.1093/bioinformatics/btp324

36. T. Liu, J. A. Ortiz, L. Taing, C. A. Meyer, B. Lee, Y. Zhang, H. Shin, S. S. Wong, J. Ma, Y.

Lei, U. J. Pape, M. Poidinger, Y. Chen, K. Yeung, M. Brown, Y. Turpaz, X. S. Liu,

Cistrome: An integrative platform for transcriptional regulation studies. Genome Biol. 12,

R83 (2011). Medline doi:10.1186/gb-2011-12-8-r83

37. F. Ramírez, F. Dündar, S. Diehl, B. A. Grüning, T. Manke, deepTools: A flexible platform

for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014). Medline

doi:10.1093/nar/gku365

38. R. C. Edgar, MUSCLE: Multiple sequence alignment with high accuracy and high

throughput. Nucleic Acids Res. 32, 1792–1797 (2004). Medline doi:10.1093/nar/gkh340

39. A. Mathelier, X. Zhao, A. W. Zhang, F. Parcy, R. Worsley-Hunt, D. J. Arenillas, S.

Buchman, C. Y. Chen, A. Chou, H. Ienasescu, J. Lim, C. Shyr, G. Tan, M. Zhou, B.

Lenhard, A. Sandelin, W. W. Wasserman, JASPAR 2014: An extensively expanded and

updated open-access database of transcription factor binding profiles. Nucleic Acids Res.

42, D142–D147 (2014). Medline doi:10.1093/nar/gkt997

40. C. E. Grant, T. L. Bailey, W. S. Noble, FIMO: Scanning for occurrences of a given motif.

Bioinformatics 27, 1017–1018 (2011). Medline doi:10.1093/bioinformatics/btr064

Page 24: Supplementary Materials for...Mar 02, 2016  · Other supplementary material for this manuscript includes the following: Tables S1 to S6 (Excel format) 2 ... Illumina FASTQ reads were

41. M. Bulmer, K. H. Wolfe, P. M. Sharp, Synonymous nucleotide substitution rates in

mammalian genes: Implications for the molecular clock and the relationship of

mammalian orders. Proc. Natl. Acad. Sci. U.S.A. 88, 5974–5978 (1991). Medline

doi:10.1073/pnas.88.14.5974

42. J. K. Pace 2nd, C. Gilbert, M. S. Clark, C. Feschotte, Repeated horizontal transfer of a DNA

transposon in mammals and other tetrapods. Proc. Natl. Acad. Sci. U.S.A. 105, 17023–

17028 (2008). Medline doi:10.1073/pnas.0806548105

43. D. Kim, B. Langmead, S. L. Salzberg, HISAT: A fast spliced aligner with low memory

requirements. Nat. Methods 12, 357–360 (2015). Medline doi:10.1038/nmeth.3317

44. M. Pertea, G. M. Pertea, C. M. Antonescu, T. C. Chang, J. T. Mendell, S. L. Salzberg,

StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat.

Biotechnol. 33, 290–295 (2015). Medline doi:10.1038/nbt.3122

45. C. Trapnell, D. G. Hendrickson, M. Sauvageau, L. Goff, J. L. Rinn, L. Pachter, Differential

analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31,

46–53 (2013). Medline doi:10.1038/nbt.2450

46. F. A. Ran, P. D. Hsu, J. Wright, V. Agarwala, D. A. Scott, F. Zhang, Genome engineering

using the CRISPR-Cas9 system. Nat. Protoc. 8, 2281–2308 (2013). Medline

doi:10.1038/nprot.2013.143

47. A. Adey, J. N. Burton, J. O. Kitzman, J. B. Hiatt, A. P. Lewis, B. K. Martin, R. Qiu, C. Lee,

J. Shendure, The haplotype-resolved genome and epigenome of the aneuploid HeLa

cancer cell line. Nature 500, 207–211 (2013). Medline doi:10.1038/nature12064

48. M. J. M. Bertrand, K. Doiron, K. Labbé, R. G. Korneluk, P. A. Barker, M. Saleh, Cellular

inhibitors of apoptosis cIAP1 and cIAP2 are required for innate immunity signaling by

the pattern recognition receptors NOD1 and NOD2. Immunity 30, 789–801 (2009).

Medline doi:10.1016/j.immuni.2009.04.011

49. M. T. Romanish, W. M. Lock, L. N. van de Lagemaat, C. A. Dunn, D. L. Mager, Repeated

recruitment of LTR retrotransposons as promoters by the anti-apoptotic locus NAIP

during mammalian evolution. PLOS Genet. 3, e10 (2007). Medline

doi:10.1371/journal.pgen.0030010