beyond secondary structure: primary-sequence determinants

26
Beyond Secondary Structure: Primary-Sequence Determinants License Pri-miRNA Hairpins for Processing Vincent C. Auyeung, 1,2,3,4 Igor Ulitsky, 1,2,3 Sean E. McGeary, 1,2,3 and David P. Bartel 1,2,3, * 1 Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA 2 Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA 3 Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 4 Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA 02139, USA *Correspondence: [email protected] http://dx.doi.org/10.1016/j.cell.2013.01.031 SUMMARY To use microRNAs to downregulate mRNA targets, cells must first process these 22 nt RNAs from primary transcripts (pri-miRNAs). These transcripts form RNA hairpins important for processing, but additional determinants must distinguish pri-miRNAs from the many other hairpin-containing transcripts expressed in each cell. Illustrating the complexity of this recognition, we show that most Caenorhabditis elegans pri-miRNAs lack determinants required for processing in human cells. To find these determi- nants, we generated many variants of four human pri-miRNAs, sequenced millions that retained func- tion, and compared them with the starting variants. Our results confirmed the importance of pairing in the stem and revealed three primary-sequence deter- minants, including an SRp20-binding motif (CNNC) found downstream of most pri-miRNA hairpins in bilaterian animals, but not in nematodes. Adding this and other determinants to C. elegans pri-miRNAs imparted efficient processing in human cells, thereby confirming the importance of primary-sequence determinants for distinguishing pri-miRNAs from other hairpin-containing transcripts. INTRODUCTION MicroRNAs (miRNAs) are 22 nt RNAs that pair to messenger RNAs (mRNAs) to direct posttranscriptional repression (Bartel, 2004). MicroRNAs are processed from hairpin-containing primary transcripts (pri-miRNAs). In the canonical processing pathway of animals, pri-miRNAs are cleaved by the Micro- processor, a protein complex containing an RNase III enzyme, Drosha, and its cofactor, DGCR8/Pasha (Lee et al., 2003; Denli et al., 2004; Gregory et al., 2004; Han et al., 2004; Landthaler et al., 2004). The liberated portion of the hairpin (the pre-miRNA) is then cleaved by the RNase III enzyme Dicer (Grishok et al., 2001; Hutva ´ gner et al., 2001), leaving two 22 nt strands that pair to each other with 2 nt 3 0 overhangs (Lee et al., 2003; Lim et al., 2003b). One strand of each duplex is loaded into an Argonaute protein to form the core of the silencing complex, and the other strand is discarded (Khvorova et al., 2003; Schwarz et al., 2003; Liu et al., 2004). Noncanonical pathways also contribute to the miRNA repertoire through the processing of mirtrons (Okamura et al., 2007; Ruby et al., 2007) or other pri-miRNAs that bypass Drosha cleavage (Babiarz et al., 2008) and through one pre-miRNA that bypasses Dicer cleavage (Cheloufi et al., 2010; Cifuentes et al., 2010). A long-standing mystery has been how pri-miRNAs are distinguished from the many other hairpin-containing tran- scripts for processing as Microprocessor substrates. Determi- nants of Dicer cleavage are better understood (Zhang et al., 2004; Macrae et al., 2006; Park et al., 2011), as illustrated by both the design (Brummelkamp et al., 2002; Paddison et al., 2002) and prediction (Chung et al., 2011) of Dicer substrates that bypass Drosha processing. For Microprocessor recogni- tion, sequences within 40 nt upstream and 40 nt downstream of the pre-miRNA hairpins are required for ectopic miRNA expression (Chen et al., 2004), which is consistent with (1) the observation that these flanking sequences tend to pair to each other to extend the stem another turn of the helix beyond the cleavage site (Lim et al., 2003b) and (2) a requirement for both this extension and a lack of pairing immediately following it for processing (Han et al., 2006). However, many cellular transcripts have paired regions flanked by single-stranded RNA (ssRNA), and most of these are not Microprocessor substrates. Indeed, attempts to predict canonical miRNA hair- pins from genomic sequence yield many thousands of false- positive predictions, which must be eliminated using additional criteria, such as analysis of conservation or experimental eval- uation (Lim et al., 2003a, 2003b; Bentwich et al., 2005; Berezi- kov et al., 2006; Chiang et al., 2010). This illustrates a large gap in our understanding of how the Microprocessor distin- guishes between authentic substrates and other transcribed hairpins. Here, we report that transcripts that enter the miRNA pathway in C. elegans failed to do so in human cells. Thus, the definition of a pri-miRNA in one species differs from that in another. 844 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

Upload: others

Post on 30-Jun-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Beyond Secondary Structure: Primary-Sequence Determinants

Beyond Secondary Structure:Primary-Sequence Determinants LicensePri-miRNA Hairpins for ProcessingVincent C. Auyeung,1,2,3,4 Igor Ulitsky,1,2,3 Sean E. McGeary,1,2,3 and David P. Bartel1,2,3,*1Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA2Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA3Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA4Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA 02139, USA*Correspondence: [email protected]

http://dx.doi.org/10.1016/j.cell.2013.01.031

SUMMARY

To use microRNAs to downregulate mRNA targets,cells must first process these �22 nt RNAs fromprimary transcripts (pri-miRNAs). These transcriptsform RNA hairpins important for processing, butadditional determinantsmust distinguish pri-miRNAsfrom the many other hairpin-containing transcriptsexpressed in each cell. Illustrating the complexity ofthis recognition, we show that most Caenorhabditiselegans pri-miRNAs lack determinants required forprocessing in human cells. To find these determi-nants, we generated many variants of four humanpri-miRNAs, sequenced millions that retained func-tion, and compared them with the starting variants.Our results confirmed the importance of pairing inthe stemand revealed three primary-sequencedeter-minants, including an SRp20-binding motif (CNNC)found downstream of most pri-miRNA hairpins inbilaterian animals, but not in nematodes. Addingthis and other determinants toC. elegans pri-miRNAsimparted efficient processing in human cells, therebyconfirming the importance of primary-sequencedeterminants for distinguishing pri-miRNAs fromother hairpin-containing transcripts.

INTRODUCTION

MicroRNAs (miRNAs) are �22 nt RNAs that pair to messenger

RNAs (mRNAs) to direct posttranscriptional repression (Bartel,

2004). MicroRNAs are processed from hairpin-containing

primary transcripts (pri-miRNAs). In the canonical processing

pathway of animals, pri-miRNAs are cleaved by the Micro-

processor, a protein complex containing an RNase III enzyme,

Drosha, and its cofactor, DGCR8/Pasha (Lee et al., 2003; Denli

et al., 2004; Gregory et al., 2004; Han et al., 2004; Landthaler

et al., 2004). The liberated portion of the hairpin (the pre-miRNA)

is then cleaved by the RNase III enzyme Dicer (Grishok et al.,

844 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

2001; Hutvagner et al., 2001), leaving two �22 nt strands that

pair to each other with �2 nt 30 overhangs (Lee et al., 2003;

Lim et al., 2003b). One strand of each duplex is loaded into an

Argonaute protein to form the core of the silencing complex,

and the other strand is discarded (Khvorova et al., 2003;

Schwarz et al., 2003; Liu et al., 2004). Noncanonical pathways

also contribute to the miRNA repertoire through the processing

of mirtrons (Okamura et al., 2007; Ruby et al., 2007) or other

pri-miRNAs that bypass Drosha cleavage (Babiarz et al., 2008)

and through one pre-miRNA that bypasses Dicer cleavage

(Cheloufi et al., 2010; Cifuentes et al., 2010).

A long-standing mystery has been how pri-miRNAs are

distinguished from the many other hairpin-containing tran-

scripts for processing as Microprocessor substrates. Determi-

nants of Dicer cleavage are better understood (Zhang et al.,

2004; Macrae et al., 2006; Park et al., 2011), as illustrated by

both the design (Brummelkamp et al., 2002; Paddison et al.,

2002) and prediction (Chung et al., 2011) of Dicer substrates

that bypass Drosha processing. For Microprocessor recogni-

tion, sequences within 40 nt upstream and 40 nt downstream

of the pre-miRNA hairpins are required for ectopic miRNA

expression (Chen et al., 2004), which is consistent with (1)

the observation that these flanking sequences tend to pair to

each other to extend the stem another turn of the helix beyond

the cleavage site (Lim et al., 2003b) and (2) a requirement for

both this extension and a lack of pairing immediately following

it for processing (Han et al., 2006). However, many cellular

transcripts have paired regions flanked by single-stranded

RNA (ssRNA), and most of these are not Microprocessor

substrates. Indeed, attempts to predict canonical miRNA hair-

pins from genomic sequence yield many thousands of false-

positive predictions, which must be eliminated using additional

criteria, such as analysis of conservation or experimental eval-

uation (Lim et al., 2003a, 2003b; Bentwich et al., 2005; Berezi-

kov et al., 2006; Chiang et al., 2010). This illustrates a large

gap in our understanding of how the Microprocessor distin-

guishes between authentic substrates and other transcribed

hairpins.

Here, we report that transcripts that enter the miRNA pathway

in C. elegans failed to do so in human cells. Thus, the definition

of a pri-miRNA in one species differs from that in another.

Page 2: Beyond Secondary Structure: Primary-Sequence Determinants

To find features that define human pri-miRNAs, we generated

more than 1011 variants of four pri-miRNAs and sequenced

millions that were cleaved by the human Microprocessor.

Comparison of cleaved and initial variants revealed important

sequence and structural features. These features were evolu-

tionarily conserved in non-nematode lineages and sufficient to

increase the processing efficiency of C. elegans hairpins in

human cells.

RESULTS

Unknown Features Specify Human Pri-miRNAsTo examine whether miRNA processing features are shared

across animals, we ectopically expressed a panel of C. elegans,

D. melanogaster, and human pri-miRNAs in human cells and

compared the yields of mature miRNA. Despite variability in

the degree of overexpression, presumably reflecting differences

in efficiency at various steps of the pathway (Fellmann et al.,

2011; Feng et al., 2011), most human miRNAs were efficiently

expressed (Figure 1A), as expected (Chiang et al., 2010). Four

of nine Drosophila miRNAs also fell within the range observed

for human miRNAs. However, the tested C. elegans miRNAs

were less efficiently expressed (Figure 1A, p = 1.4 3 10�5,

Wilcoxon rank-sum test). Similar results were observed in

Drosophila S2 cells (p = 0.024). Thus, most nematode pri-

miRNAs lack determinants required for efficient processing in

human or insect cells.

To isolate the processing defect, we probed for processing

intermediates. Consistent with the sequencing results, cel-lin-4

was processed, with detectable pre-miRNA and mature miRNA

(Figure 1B). For otherC. elegansmiRNAs, neither pre-miRNA nor

mature miRNA was detected, despite the presence of primary

transcripts (Figure 1B; Figure S1B, available online), suggesting

that these C. elegans pri-miRNAs were not productively

recognized as Microprocessor substrates. To assay directly for

Microprocessor binding, we examined binding to catalytically

deficient Drosha and DGCR8. Whereas human pri-mir-122

bound the Microprocessor somewhat better than did the

reference pri-miRNA (human pri-mir-125a), all seven tested

C. elegans pri-miRNAs bound worse (Figure 1C). Thus, most

C. elegans pri-miRNAs are missing some of the determinants

needed for efficient recognition and processing by the human

Microprocessor.

Known features of C. elegans and human pri-miRNAs appear

largely similar, as illustrated by the accuracy of an algorithm

trained on C. elegans pri-miRNAs in predicting most miRNA

genes conserved in mammals and fish (Lim et al., 2003a). None-

theless, the poor specificity of this algorithm when predicting

nonconserved miRNAs suggests that unknown features help

define authentic pri-miRNAs. To look for clues regarding these

unknown features, we analyzed the conservation of sequence

immediately flanking human pre-miRNAs. Residues extending

13 nt upstream of the 5p Drosha cleavage site (i.e., the site

corresponding to the 50 end of the pre-miRNA) and 11 nt down-

stream of the 3p Drosha cleavage site were conserved above

background, consistent with the importance of the�11 bp basal

stem for pri-miRNA processing (Figure 1D). However, the signal

beyond the basal stem tailed off rapidly (particularly in the

upstream flanking region), suggesting that any determinants in

the flanking regions might be either at variable distances from

the hairpin or present in only subsets of miRNAs, making them

difficult to identify using alignments.

Functional Substrates from Large Librariesof Pri-miRNA VariantsTo identify features important for Microprocessor recognition

and cleavage, we generated more than 1011 pri-miRNA variants,

sequenced millions that retained function, and compared these

sequences to those of the initial variants (Figure 2A). This

approach resembled classical in vitro selection approaches

(Wilson and Szostak, 1999), except we did not perform multiple

rounds of selection. Because the starting and the selected

pools underwent the same number of transcription, reverse-

transcription, and amplification steps, any differences between

the two pools were subject to neither the compounding effects

of multiple rounds nor the confounding effects of amplification

biases. Moreover, as with previous analyses of selection results

using high-throughput sequencing (Zykovich et al., 2009; Pitt

and Ferre-D’Amare, 2010; Slattery et al., 2011), sequencing

depth reduced the influence of stochastic sampling. Thus,

compared to the results of classical approaches, enrichment

or depletion of a residue was a more direct reflection of its

contribution to biochemical specificity.

Four pools of variants were constructed, each based on

a different human pri-miRNA (mir-125a, mir-16-1, mir-30a, and

mir-223). Residues more than 8 nt upstream of the 5p Drosha

cleavage site and more than 8 nt downstream of the 3p cleavage

site were varied, whereas the remaining hairpin residues were

not. At each variable position, 79% of the molecules had the

wild-type residue, and the remainder had one of the other

three alternatives. As done for self-cleaving ribozymes (Pan

and Uhlenbeck, 1992), each variant was circularized so that all

of its variable nucleotides resided in a single cleavage product

(Figure 2A), thereby enabling a full analysis of sequence

interdependencies.

In vitro cleavage reactions were conducted in Microprocessor

lysate, i.e., whole-cell lysate fromHEK293T cells overexpressing

Drosha and DGCR8 to enhance cleavage activity (Figure 2B).

At a time in which the lysate cleaved linear and circularized

pri-mir-125a to near completion, many pri-mir-125a variants

remained uncleaved (Figure 2C), which indicated that some

substitutions in the basal stem and flanking regions attenuated

Microprocessor cleavage in vitro.

Cleaved variants were purified and sequenced (Figure 2A). At

each variant position, the odds of each nucleotide in the

cleaved pool were compared to the odds of that nucleotide in

the starting pool. These odds ratios were used to calculate

the information content of each nucleotide possibility at

each variant position—the greater the information content, the

more favorable the influence on activity, with positive values

indicating beneficial influences and negative values disruptive

ones. An advantage of plotting information content is that it

reports the relative influence of each nucleotide possibility irre-

spective of whether it was the wild-type possibility. Because

molecular manipulations and computational filtering both

selected for cleavage at the wild-type site, nucleotide changes

Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc. 845

Page 3: Beyond Secondary Structure: Primary-Sequence Determinants

Basal stem Basal stem

0.5

0.6

0.7

0.8

0.9

1

–1–11–21–31–41–515p position

P1 P11 P21 P31 P41 P51

Ave

rage

BLS

Pre-miRNA position

0.5

0.6

0.7

0.8

0.9

1

1 11 21 31 41 513p position

–1

–13

+1

+11

P1

P22

A

B C

D

S2 cells

Human Nematode

0

5

10

15

20

25

30

hsa-

mir-

1-1

hsa-

mir-

128-

1hs

a-m

ir-20

5ce

l-lin

-4ce

l-lsy

-6ce

l-mir-

40ce

l-mir-

50ce

l-mir-

230

cel-m

ir-24

0

499

0

200

400

600

800

1000

hsa-

mir-

1-1

hsa-

mir-

17hs

a-m

ir-18

ahs

a-m

ir-19

ahs

a-m

ir-20

ahs

a-m

ir-19

b-1

hsa-

mir-

92a-

1hs

a-m

ir-12

2hs

a-m

ir-12

5ahs

a-m

ir-12

8-1

hsa-

mir-

133a

-1hs

a-m

ir-13

8-2

hsa-

mir-

142

hsa-

mir-

205

dme-

mir-

2a-1

dme-

mir-

4dm

e-m

ir-5

dme-

mir-

34dm

e-m

ir-92

adm

e-m

ir-12

5dm

e-m

ir-28

6dm

e-m

ir-27

9dm

e-m

ir-28

1-1

cel-m

ir-2

cel-l

in-4

cel-l

sy-6

cel-m

ir-34

cel-m

ir-40

cel-m

ir-43

cel-m

ir-44

cel-m

ir-46

cel-m

ir-50

cel-m

ir-59

cel-m

ir-60

cel-m

ir-12

4ce

l-mir-

235

cel-m

ir-24

0

Nor

mal

ized

hai

rpin

read

s

HEK293 cells

Human Fly Nematode

4919

1036

230

3732

7150

2437

0541

71

6752

2777

1151

– + – + – + – + – + – + – + – +

Query

Reference(hsa-mir-125a)

hsa-

mir-

122

cel-l

sy-6

cel-m

ir-44

cel-m

ir-46

cel-m

ir-50

cel-m

ir-59

cel-m

ir-60

cel-m

ir-23

5

Query miRNA:

DroshaTNDGCR8

1.00

1.65

1.00

0.21

1.00

0.13

1.00

0.07

1.00

0.09

1.00

0.08

1.00

0.09

1.00

0.33

MicroprocessorNitrocellulose

filtration

Query

Reference(hsa-mir-125a)

cel-m

ir-40

hsa-

mir-

1-1

cel-m

ir-50

cel-l

sy-6

cel-l

in-4

-70

-20

Pre-miRNA

Mature miRNA

Pri-miRNA: 1.00 0.63 1.12 1.77 2.45

Figure 1. Existence of Unknown Features Specifying Human Pri-miRNAs

(A) Processing of human, fly, and nematode pri-miRNAs in human cells and Drosophila cells. Cells were transfected with plasmids expressing the indicated

pri-miRNA hairpins with �100 flanking genomic nucleotides on each side of each hairpin (Figure S1A), and total RNA was pooled for small-RNA sequencing.

Plotted are small-RNA reads derived from the indicated pri-miRNAs.

(B) Accumulation of pri-miRNA, pre-miRNA, and miRNA after expressing the indicated pri-miRNAs in HEK293T cells. Pre-miRNA and mature species were

measured by RNA blot of total RNA from cells transfected with plasmids expressing the indicated pri-miRNA (full gel images, including in vitro-transcribed

cognate positive controls, in Figure S1B). Relative pri-miRNA levels (indicated above the lanes) are from ribonuclease protection assays, normalized to the signals

for neomycin phosphotransferase mRNA also expressed from each expression plasmid.

(C) Relative binding ofC. elegans and human pri-miRNAs to theMicroprocessor. In the competitive binding assay (top, schematic), radiolabeled query pri-miRNA

was mixed with the radiolabeled shorter reference pri-miRNA (human mir-125a) and incubated in excess over catalytically impaired Drosha (Drosha-TN)

and DGCR8. Bound RNA was filtered on nitrocellulose and eluted for analysis on a denaturing gel. Phosphorimaging (bottom) indicated the relative amounts

of input (�) and bound (+) RNAs. Numbers below each lane indicate the ratio of bound query to bound reference pri-miRNAs, normalized to their input ratio.

(D) Nucleotide conservation of human pri-miRNAs conserved to mouse, reported as the average branch-length score (BLS) at each position. Positions are

numbered based on the inferred Drosha cleavage site (inset); negative indices are upstream of the 5p Drosha cleavage site, indices with ‘‘P’’ count from the 50 endof the pre-miRNA, and positive indices are downstream of the 3p Drosha cleavage site.

See also Figure S1.

846 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

Page 4: Beyond Secondary Structure: Primary-Sequence Determinants

A

D

WTCirc.

WTLinear

PoolCirc.

pri-mir-125aTopology

B C293T transfection

Moc

k

Dro

sha

DG

CR

8

Circular pri-miRNAs(pool of variants)

Functional variantsNonfunctional variants

Library for paired-end sequencing

DroshaDGCR8

DroshaDGCR8

Splint-ligated product

–0.4

–0.2

0

0.2

0.4

0.6

0.8

1.0

–49

–47

–45

–43

–41

–39

–37

–35

–33

–31

–29

–27

–25

–23

–21

–19

–17

–15

–13

–11 –9

Info

rmat

ion

cont

ent (

bits

)

9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45

hsa-mir-16-1

–0.4

–0.2

0

0.2

0.4

0.6

0.8

1.0

1.2

–49

–47

–45

–43

–41

–39

–37

–35

–33

–31

–29

–27

–25

–23

–21

–19

–17

–15

–13

–11 –9

Info

rmat

ion

cont

ent (

bits

)

9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45

hsa-mir-30a

–0.4

–0.2

0

0.2

0.4

0.6

–49

–47

–45

–43

–41

–39

–37

–35

–33

–31

–29

–27

–25

–23

–21

–19

–17

–15

–13

–11 –9In

form

atio

n co

nten

t (bi

ts)

5p position

9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47

3p position

hsa-mir-223

–1

–14

–9+1

+12+9

A C

G U

–0.4

–0.2

0

0.2

0.4

0.6

0.8

1.0

–51

–49

–47

–45

–43

–41

–39

–37

–35

–33

–31

–29

–27

–25

–23

–21

–19

–17

–15

–13

–11 –9

Info

rmat

ion

cont

ent (

bits

)

hsa-mir-125a

Invariantresidues

Stem-loop

9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45

Figure 2. Selection for Functional Pri-miRNA Variants

(A) Schematic of the selection. Pri-miRNAs with variable residues (red) flanking the Drosha cleavage site were circularized by ligation and incubated in

Microprocessor lysate. Cleaved variants were gel purified, ligated to adaptors, reverse transcribed, and amplified for high-throughput sequencing.

(B) Cleavage of let-7a in HEK293T whole-cell lysate (mock) and Microprocessor lysate (whole-cell lysate from HEK293T cells transfected with plasmids

expressing Drosha and DGCR8). Incubations were 1.5 hr. Body-labeled reactants and products were resolved on a denaturing polyacrylamide gel and visualized

by phosphorimaging.

(C) Cleavage of linear and circularmir-125a (WT linear and WT circ., respectively) and a pool of circular mir-125a variants (pool). RNAs were incubated for 5 min

in Microprocessor lysate and analyzed as in (B). The linear RNA was 50 end labeled; other RNAs were body labeled.

(D) Enrichment and depletion at variable residues in functional pri-miRNA variants. At each varied position (inset, red inner line), information content was

calculated for each residue (green, cyan, black, and red for A, C, G, and U, respectively).

See also Figure S2.

Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc. 847

Page 5: Beyond Secondary Structure: Primary-Sequence Determinants

that altered the cleavage site were not distinguished from those

that abolished cleavage.

Some positions had substantial enrichment of one or more

nucleotide possibilities, with corresponding depletion of others

(Figure 2D). When tested in vitro, the results of changing specific

residues closely matched those predicted from analysis of

sequenced variants (Figures S2A and S2B). Moreover, the

in vitro results predicted the direction and sometimes the magni-

tude of the effects observed in HEK293T cells (Figure S2C).

Importance of an 11 bp Basal Stem Flanked by at LeastNine Unstructured NucleotidesFor all four miRNAs, some of the varied residues with the great-

est influence fell within the basal stem (Figure 2D). Covariation

matrices listing the odds ratio of each pair of nucleotide identities

showed preference for Watson-Crick geometry at each basal

pair, with the G:U wobble the most frequently preferred non-

Watson-Crick alternative (Figures 3A and S3A). For example,

the most favored alternatives to the wild-type C:G pair at posi-

tions �11 and +9 of mir-125a were the G:C and U:A pairs, and

to a lesser extent, the A:U, G:U, and U:G pairs (Figure 3A). In

fact, Watson-Crick pairing was strongly preferred even if it did

not occur in the wild-type sequence. For example, the wild-

type A:C pair at positions�12 and +10 ofmir-30awas disfavored

compared to the four Watson-Crick possibilities (Figure 3A), and

the bulged A at position +10 of mir-223 was preferentially incor-

porated into an alternative continuous helix (Figures S3A and

S3B). Extending these methods to systematically evaluate all

pairing possibilities involving all varied positions uncovered

no evidence for Watson-Crick pairing outside the basal stem

(Figure S3C).

Layered on the overall preference for Watson-Crick pairing

were primary-sequence preferences specific to each basal

pair. For example, at positions �11 and +9 the C:G pair was

favored over the other Watson-Crick alternatives. The primary-

sequence preference was most acute at the basal-most pair,

where wobbles or mismatches involving G at �13 were favored

over alternative Watson-Crick pairs (Figure 3A). We conclude

that primary-sequence features supplement and sometimes

supersede structural features important for basal-stem

recognition.

The Microprocessor recognizes the junction between the

miRNA hairpin and flanking ssRNA to position the active site

approximately one helical turn (11 bp of A-form RNA) from the

base of the duplex (Han et al., 2006; Yeom et al., 2006). To

examine the preferred length of the basal stem, we calculated

the relative cleavage efficiencies of different stem-length vari-

ants, normalizing to that of an 8 bp stem. Invariant mismatches

within symmetric internal loops (e.g., the A:C mismatch at

positions �6 and +4 ofmir-30a) were assumed to be noncanon-

ical pairs that stacked within the stem to contribute to its

length, whereas mismatches at varied positions were assumed

to disrupt further pairing and thereby terminate the inferred

basal stem. For all four pri-miRNAs, an 11 bp basal stem was

optimal (Figure 3B), consistent with the single-turn model.

Indeed, an 11 bp basal stem was preferred for mir-223 even

though the wild-type sequence was predicted to form a 12 bp

stem (Figures 3A and S3A). For most pri-miRNAs, however,

848 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

the efficiency of the 12-pair stem approached that of the

11-pair stem (Figure 3B). This tolerance of a twelfth pair hinted

that other features, such as the G at position �13, help specify

the precise site of cleavage.

The single-turn model also posits that the nucleotides imme-

diately flanking the basal stem are unstructured (Han et al.,

2006; Yeom et al., 2006). To test this, we used RNAfold

(Hofacker and Stadler, 2006) to predict the minimum free-

energy structure of each sequenced pri-miRNA variant. For

those with predicted wild-type stem pairing, we recorded

the number of nucleotides between the base of the stem and

the most proximal two consecutive structured residues.

Although an imperfect estimate of the size of the unstructured

segments flanking the base of the helix, this metric correlated

well with cleavage (Figure 3C). Predicted pairing was tolerated

in one flank, provided that the other flank contained at least

5–7 unpaired bases, consistent with reports of some cleavage

when only one flanking segment is present (Zeng and Cullen,

2005; Han et al., 2006). When summing the flanking unpaired

bases from both sides, the optimum plateaued at �9–18 nt

(Figure 3D).

A Basal UG Motif Enhances ProcessingAmong the nucleotides upstream of the stemloop, the most

striking enrichment was for a U at position �14 (Figure 2D).

This U immediately preceded the position that, as mentioned

above, displayed a strong primary-sequence preference for a

G. The U and G at positions �14 and �13 contributed

independently; variants with either a U or a G were enriched

over variants with neither, and variants with both were even

more enriched (Figure 4A). For mir-223, the UG at posi-

tions �14 and �13 was preferred (Figure 2D), even though

wild-type mir-223 has a UG at positions �15 and �14, respec-

tively. This basal UG motif was also enriched among variants

of mir-125a selected for Microprocessor binding rather than

cleavage (Figure S4B).

The basal UG was conserved in vertebrate orthologs of

mir-16-1 and mir-30a (Figure 4B). Moreover, the motif was

enriched in other mammalian pri-miRNAs, as illustrated by the

sequence composition of human pri-miRNAs (Figure 4C). It

was also enriched in pri-miRNAs of zebrafish (D. rerio) and

tunicate (C. intestinalis) but only sporadically in more distantly

related lineages, suggesting that its recognition emerged in

a chordate ancestor (Figure 4D).

The Broadly Conserved CNNC Motif EnhancesProcessingIn mir-16-1, mir-30a, and mir-223 we observed a preference for

two C residues, separated by two intervening nucleotides,

beginning 17–18 nt downstream of the Drosha cleavage site

(Figure 2D). The two C residues of this CNNC motif (N signifies

any nucleotide) acted synergistically, in that variants that re-

tained neither C residue were not disfavored much more than

those that retained one (Figure 5A). The C residues enriched in

the active variants were conserved in vertebrate orthologs of

these three pri-miRNAs (Figure 5B).

The mir-125a pri-miRNA also had four C residues in this

vicinity (positions 16–21), which gave rise to a CNNC at position

Page 6: Beyond Secondary Structure: Primary-Sequence Determinants

0 2 4 6 8 10 120

2

4

6

8

10

12

hsa-mir-125a

5p unstructured nucleotides

3p u

nstru

ctur

ed n

ucle

otid

es

0 5 10

hsa-mir-16-1

0 2 4 6 8 10 120

2

4

6

8

10

12

hsa-mir-223hsa-mir-30a

0 2 4 6 8 10 120

2

4

6

8

10

12

−1.0−0.8−0.6−0.4−0.2

00.20.40.60.81.0

Odd

s ra

tio (l

og2)

A

D

C

B

Rel

ativ

e cl

eava

ge

hsa-mir-125a hsa-mir-30ahsa-mir-16-1 hsa-mir-223

Basal stem pairs8 9 10 11 12 13

Timepoint 1Timepoint 2

Timepoint 1Timepoint 2

0 2 4 6 8 10 12 14 16 18 20

Rel

ativ

e cl

eava

ge

Flanking unstructured nucleotides

hsa-mir-125a

0 2 4 6 8 10 12 14 16 18 20Flanking unstructured nucleotides

hsa-mir-16-1

0 2 4 6 8 10 12 14 16 18 20Flanking unstructured nucleotides

hsa-mir-30a

1

2

4

8

16

0 2 4 6 8 10 12 14 16 18 20Flanking unstructured nucleotides

hsa-mir-223

A C G U

A –1.37 –1.22 –1.35 –0.03

C –0.95 –1.27 1.14 –0.90

G –0.90 0.34 –1.02 –0.04

U 0.36 –0.85 –0.30 –0.48P

ositi

on –

11

Position 9Pair 9

A C G U

A –1.48 –1.18 –1.20 –0.23

C –1.11 –1.60 1.17 –1.14

G –0.54 0.17 –0.41 –0.16

U –0.15 –1.50 –0.40 –0.76

Pos

ition

–12

Position 10Pair 10

A C G U

A –0.50 –0.93 –0.58 –0.22

C –0.98 –1.61 –0.40 –1.02

G –0.35 0.81 –0.27 0.04

U –0.04 –0.74 –0.28 –0.30

Pos

ition

–13

Position 11Pair 11

hsa-mir-125aWild-type basal stem

pre-miRNA

+1

–1

–11–12–13

+9+10+11

G G A-U U-G C-GU CC GU U G-C A-U C-G C-G G-CU CU CG CU A

–4.17 –3.50 –3.45 –0.39

–3.29 –3.72 2.50 –2.45

–2.85 0.41 –2.60 0.14

0.09 –2.16 –0.66 –1.29

A C G U

A

C

G

U

Position 9

Pos

ition

–11

Pair 9

–2.04 –0.67 –1.06 0.35

–1.25 –1.83 1.03 –0.95

–0.22 0.70 0.15 0.39

0.74 –1.44 –0.03 –0.04

A C G U

A

C

G

U

Position 10

Pos

ition

–12

Pair 10

–1.12 –2.05 –1.23 –0.93

–1.51 –3.12 –0.69 –1.65

0.32 0.74 –0.16 0.37

–0.57 –1.70 –0.71 –1.22

A C G U

A

C

G

U

Position 11

Pos

ition

–13

Pair 11

hsa-mir-16-1Wild-type basal stem

pre-miRNA

+1

–1

–11–12–13

+9+10+11

U C C-G G-U U-AG AA G C-G G-U A-U C-G U-A G-CU CA AA UC A

–3.34 –3.01 –3.07 –0.98

–2.18 –2.93 2.64 –2.18

–0.98 0.13 –1.29 –0.08

–0.27 –2.72 –1.74 –1.82

A C G U

A

C

G

U

Position 9

Pos

ition

–11

Pair 9

A C G U

A –0.92 –0.65 –0.22 1.63

C 0.18 –1.45 2.39 –0.14

G 0.00 1.57 0.58 0.99

U 1.63 –1.00 1.38 0.48

Position 10

Pos

ition

–12

Pair 10

A C G U

A –1.80 –2.40 –1.15 –0.96

C –2.37 –2.92 –0.95 –2.17

G –1.60 2.01 –0.51 –0.21

U –1.66 –3.63 –1.98 –1.88

Position 11

Pos

ition

–13

Pair 11

pre-miRNA

+1

–1

–11–12–13

+9+10+11

C A G-U C-G G-CA C G-U U-A G-C A-U C-GA C G-CU UU CG GU G

hsa-mir-30aWild-type basal stem

−1.0−0.8−0.6−0.4−0.2

00.20.40.60.81.0

Odd

s ra

tio (l

og2) 10

5

0

Basal stem pairs8 9 10 11 12 13

1248

163264

Basal stem pairs8 9 10 11 12 13

Basal stem pairs8 9 10 11 12 13

1248

163264

1248

163264

1248

163264

1

2

4

8

16

1

2

4

8

16

1

2

4

8

16

5p unstructured nucleotides0 5 10

10

5

0

5p unstructured nucleotides0 5 10

10

5

0

5p unstructured nucleotides0 5 10

10

5

0

Figure 3. Basal Stem Structure in Functional Pri-miRNA Variants

(A) Predicted basal secondary structures and covariation matrices for mir-125a, mir-16-1, and mir-30a. For each pair of positions, joint nucleotide distributions

were tabulated from sequences of the initial and selected pools, and the log odds ratio was calculated. Favored and disfavored pairs are colored red and blue,

respectively, with color intensity (key) and values indicating magnitudes.

(B) Relative cleavage of variants with different stem lengths. The number of contiguous Watson-Crick pairs was counted, and the relative cleavage was

calculated, normalized to the 8 bp stem. For selections with two time points, results are shown for both (key).

(C) Enrichment for unstructured nucleotides flanking the basal stem. Predicted folds of variant sequenceswere generated, and the subset of sequenceswithwild-

type basal stem pairing were classified based on the distance to the nearest consecutive structured nucleotides upstream of position �13 and the nearest

consecutive structured nucleotides downstream of position +11. Enrichment (red) and depletion (blue) of unstructured lengths among the selected variants are

colored (key), with black indicating that sequencing data were insufficient to calculate enrichment.

(D) Relative cleavage of variants with differing numbers of total unstructured nucleotides flanking the basal stem. Upstreamand downstream unstructured lengths

predicted in (C) were summed, and the relative cleavage was calculated, normalized to zero unstructured nucleotides. For selections with two time points, results

are shown for both (key).

See also Figure S3.

Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc. 849

Page 7: Beyond Secondary Structure: Primary-Sequence Determinants

0.0

0.1

0.2

0.3

0.4

0.5

0.6

–9–11–13–15–17–19

Freq

uenc

y

Position

UGA CC

A

B

D H. sapiensD. rerio

D. melanogasterA. gambiae

D. pulexC. elegansC. briggsaeP. pacificusC. teletaL. giganteaS. mediterranea

C. intestinalis

N. vectensis0%

8%

16%

24%

miR

NA

s w

ith p

ositi

oned

UG

–22 –20 –18 –16 –14 –12 –10 –8 –6 –4 –2 Drosha cleavage

sitePosition

Human UG position

Chordates

Arthropods

Nematodes

Lophotrochozoans

Phy

loP

verte

brat

eco

nser

vatio

n

4

0

–4

chr1952,196,503–52,196,510

(+)

T G T T G C C A

–17 –15 –13 –115p position

hsa-mir-125a

4

0

–4

chr1350,623,097–50,623,104

(–)

C A A T G T C A

–17 –15 –13 –115p position

hsa-mir-16-1

4

0

–4

chr672,113,329–72,113,336

(–)

T G T T G A C A

–17 –15 –13 –115p position

hsa-mir-30a

4

0

–4

chrX65,238,719–65,238,726

(+)

C C T G C A G T

–17 –15 –13 –115p position

hsa-mir-223

Timepoint 1Timepoint 2

Rel

ativ

e cl

eava

ge

No

(–14

) mot

if

U(–

14) o

nly

G(–

13) o

nly

UG

(–14

)0.5

1

2

4

8

16

No

(–14

) mot

if

U(–

14) o

nly

G(–

13) o

nly

UG

(–14

)0.5

1

2

4

8

16

No

(–14

) mot

if

U(–

14) o

nly

G(–

13) o

nly

UG

(–14

)0.5

1

2

4

8

16

No

(–14

) mot

if

U(–

14) o

nly

G(–

13) o

nly

UG

(–14

)0.5

1

2

4

8

16

hsa-mir-125a hsa-mir-16-1 hsa-mir-30a hsa-mir-223

Ecd

ysoz

oans

***

*

Figure 4. The Basal UG Motif

(A) Relative cleavage of variants with a full UGmotif, a partial motif, and no motif. Values were normalized to those of variants with no motif, showing results from

two time points, if available (key).

(B) PhyloP conservation across 30 vertebrate species in the region of the basal UG motif (red letters) for the four selected miRNAs. Bars extending beyond the

scale of the graph are truncated (pink). Nucleotides predicted to be paired in the wild-type basal stem are shaded.

(C) Frequencies of A, C, G, and U (green, cyan, black, and red, respectively) at the indicated positions of human pri-miRNAs conserved to mouse. Analysis was of

204 pri-miRNAs, each representing a unique paralogous family (Table S2).

(D) Enrichment for the UG dinucleotide in the pri-miRNAs of representative animals with sequenced genomes. UG occurrences were tabulated for the upstream

regions of pri-miRNAs aligned on the predicted Drosha cleavage site (Table S2). Species with statistically significant enrichment at position �14 are indicated

(asterisks, empirical p value < 10�3).

See also Figure S4.

16 and the possibility of creating a CNNC at positions 17 or 18

(by changing either A20 or A18, respectively, to a C). However,

the CNNC at position 16 was not preferred in the selection, nor

were either of the single-nucleotide changes that could create

a CNNC (Figures 2D and 5A). Moreover, the position 16 CNNC

was not conserved in vertebrate orthologs (Figure 5B). These

results indicate that unidentified features present in mir-16-1,

mir-30a, and mir-223, but not mir-125a, are required for the

CNNC to increase processing efficiency.

850 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

For the three pri-miRNAs in which the CNNC motif was effec-

tive, its position fell in a small window 17–18 nt downstream of

the Drosha cleavage site. In variants in which neither wild-type

C was present, alternative CNNC motifs were strongly enriched

1–2 nt downstream (Figure S5A), which further indicated that

a CNNC motif within a small range of positions can contribute

to pri-miRNA recognition.

Of the 64 possible dinucleotide motifs with zero to three inter-

vening nucleotides, CNNC was the one most highly enriched

Page 8: Beyond Secondary Structure: Primary-Sequence Determinants

Phy

loP

verte

brat

eco

nser

vatio

n

4

0

–4

hsa-mir-125achr19

52,196,597–52,196,605(+)

A C C A C A C A C

15 17 19 21 233p position

4

0

–4

hsa-mir-16-1chr13

50,623,097–50,623,105(–)

A C T C T A C A G

15 17 19 21 233p position

4

0

–4

hsa-mir-30achr6

72,113,234–72,113,242(–)

G A C T T C A A G

15 17 19 21 233p position

4

0

–4

hsa-mir-223chrX

65,238,816–65,238,824(+)

T A C C A G C T C

15 17 19 21 233p position

D

C

B

A

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Drosha cleavage site Position

Human CNNC window

H. sapiensD. rerio

D. melanogasterA. gambiae

D. pulexC. elegansC. briggsaeP. pacificusC. teletaL. giganteaS. mediterranea

C. intestinalis

N. vectensis

Chordates

Arthropods

Nematodes

Lophotrochozoans

Ecd

ysoz

oans

0%

10%

20%

30%m

iRN

As

with

pos

ition

ed C

NN

C******

***

Timepoint 1Timepoint 2

hsa-mir-16-1

0.5

1

2

4

8

Nei

ther

C

C(1

8) o

nly

C(2

1) o

nly

CN

NC

hsa-mir-30a

0.5

1

2

4

8

Nei

ther

C

C(1

7) o

nly

C(2

0) o

nly

CN

NC

hsa-mir-223

0.5

1

2

4

8

Nei

ther

C

C(1

8) o

nly

C(2

1) o

nly

CN

NC

hsa-mir-125a

0.5

1

2

4

8N

eith

er C

C(1

6) o

nly

C(1

9) o

nly

CN

NC

Rel

ativ

e cl

eava

ge

0

1

2

3

4

1 4 7 10 13 16 19 22 25 28Position

Human miRNAs(conserved to mouse)

CNNCNNNNNNNNNNNNNN

0

1

2

3

4

5

6

1 4 7 10 13 16 19 22 25 28Position

D. melanogaster miRNAs

0

2

4

6

8

10

12

14

1 4 7 10 13 16 19 22 25 28Position

S. mediterranea miRNAs

0

1

2

3

4

5

1 4 7 10 13 16 19 22 25 28Position

C. elegans miRNAs

Sig

nal /

bac

kgro

und

Figure 5. The Downstream CNNC Motif(A) Relative cleavage of variants with a full CNNC motif, a partial motif, and no motif. Values were normalized to those of variants with no motif, showing results

from two time points, if available (key).

(B) PhyloP conservation across 30 vertebrate species in the region of the downstream CNNCmotif (blue letters) for the four selected pri-miRNAs. Bars extending

beyond the scale of the graph are truncated (pink).

(C) CNNC enrichment compared to that of 63 other spaced dinucleotide motifs. Occurrences of each motif were tabulated for the downstream regions of pri-

miRNAs aligned on the predicted Drosha cleavage site (Table S2). Background expectation was based on the nucleotide composition of pri-miRNA downstream

regions in each species.

(D) Enrichment of the CNNC motif in the pri-miRNAs of representative bilaterian animals (Table S2). Species with statistically significant enrichment at positions

16, 17, or 18 are indicated (asterisk, empirical p value < 10�4).

See also Figure S5.

Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc. 851

Page 9: Beyond Secondary Structure: Primary-Sequence Determinants

191 kDa –

64 kDa –51 kDa –

39 kDa –

28 kDa –

14 kDa – SRp20TUT4b

9G8SKIV2L

ILF2

365 nm UV: – +

Massspectrometry:

IP: Inpu

t

Mou

se Ig

G

Ant

i-FLA

G

Ant

i-SR

p20

Ant

i-9G

8

RNA:

sBioGApCUUC

hsa-mir-30a

A B C

D

E

sBioGApCUUC

365 nmcrosslinking

Binding tosolid support

Washing

Elution byRNase T1

sBioGApCUUC

?

Stre

ptav

idin

-coa

ted

bead

sApCUUCAAGp

?

hsa-mir-30a

Bindingprotein?

RNA-protein complexfor mass spectrometry

–1

–13

+1

+11

P1

P22

Cro

sslin

k si

tes

0

2

4

6

8

10

12

–41 –31 –21 –11 –15p position

SRp75SRp20

0

2

4

6

8

10

12

P11P1 P21 P31 P41Pre-miRNA position

Human CNNCwindow

0

2

4

6

8

10

12

1 11 21 31 413p position

0.5

1

2

4

8

WT CNNC mutant

EGFPSRp20

hsa-mir-16-1

Rel

ativ

e cl

eava

ge

Figure 6. Binding and Activity of SRp20 at the CNNC Motif

(A) Site-specific crosslinking approach used to identify CNNC-binding proteins. Themir-30a crosslinking substrate contained a photoreactive base in the CNNC

motif (4-thiouridine, U–S), a 30 biotin (Bio), and for some applications, a 32P-labeled phosphate (red p). This substrate was incubated inMicroprocessor lysate and

irradiated with 365 nm UV light. Crosslinked complexes were captured on streptavidin-coated beads and eluted by RNase T1 digestion.

(B) Proteins within crosslinked RNA-protein complexes. Crosslinked complexes prepared as in (A) were separated on an SDS gel. For each CNNC-crosslinked

band, proteins are listed that were identified by mass spectrometry and have known or inferred RNA-binding activity.

(C) Immunoprecipitation of proteins crosslinked to the CNNC motif. After crosslinking as in (A), complexes were enriched using monoclonal antibodies against

either FLAG (the tag of the overexpressed Drosha and DGCR8), SRp20, or 9G8 and then resolved on an SDS gel. Input was run on a different region of the same

gel for reference.

(D) SRp20 binding downstream of mouse pri-miRNA hairpins in vivo. Sites were obtained by reanalysis of crosslinking data for SRp20 and SRp75 in mouse

cells (Anko et al., 2012). Positions are numbered as in Figure 1D. Expected sites of crosslinks to any of the motif nucleotides in the region of motif enrichment

(Figure 5D) are shaded (gray).

(E) Enhancement of in vitro pri-miRNA cleavage by SRp20. Wild-type pri-mir-16-1 or pri-mir-16-1 with mutated CNNC were incubated for 3 min with

immunopurifiedMicroprocessor, supplemented with either FLAG-EGFP or 3X-FLAG-SRp20 purified fromHEK293T cells. Reactants and products were resolved

on denaturing polyacrylamide gels and quantified by phosphorimaging relative to a buffer-only control (geometric mean ± standard error, n = 3).

See also Figure S6.

downstream of the cleavage sites of human pri-miRNAs (Fig-

ure 5C). Moreover, enrichment was limited to a small range of

positions 16–18 nt downstream of the site, peaking at positions

17 and 18, which matched the positions of the motif within

mir-16-1, mir-30a, and mir-223. These results suggest that

the CNNC motif enhances processing of many human pri-

miRNAs.

Similar analyses of nonmammalian pri-miRNAs indicated

strong, position-specific enrichment of the CNNC motif in

chordates, arthropods, and lophotrochozoans, but not in sea

anemone (Nematostella vectensis) (Figures 5C and 5D), suggest-

ing that its recognition emerged with the divergence of bilater-

ians. Interestingly, enrichment was also absent in nematodes

(Figures 5C and 5D), suggesting an isolated loss in the nematode

branch of the ecdysozoans.

852 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

Consistent with the results in extracts, mutation of the basal

UG and downstream CNNC motifs each reduced accumulation

of mature miR-16 and miR-30a in HEK293T cells, with mutation

of both reducing accumulation �4–8-fold relative to wild-type

(Figures S5B and S5C). Furthermore, one or both motifs contrib-

uted to the accumulation of each of the additional pri-miRNAs

tested in cell culture (hsa-mir-28, hsa-mir-129-2, and hsa-mir-

193b; Figures S5D–S5F).

SRp20 Binds the CNNC Motif and Enhances ProcessingTo learn how the CNNC motif is recognized, we used site-

specific crosslinking (Wyatt et al., 1992). Proteins that cross-

linked to pri-mir-30a RNA with a photoreactive nucleotide

(4-thiouridine) placed within the CNNC motif were identified by

mass spectrometry (Figure 6A). To guide gel-purification of

Page 10: Beyond Secondary Structure: Primary-Sequence Determinants

crosslinked proteins, we performed the procedure in parallel

with a radiolabeled pri-miRNA designed to label only proteins

that crosslinked in the vicinity of the CNNC (Figures 6A and

6B). The two strongest candidates were SRp20/SRSF3 and

9G8/SRSF7, closely related proteins implicated in splicing

regulation (Zahler et al., 1993; Cavaloc et al., 1994), mRNA

export (Huang and Steitz, 2001), and translation initiation

(Bedard et al., 2007; Swartz et al., 2007). These proteins

both have an RNA-recognition motif (RRM) conserved across

bilaterian animals, which recognizes degenerate motifs closely

related to the CNNC motif (Heinrichs and Baker, 1995; Cavaloc

et al., 1999; Schaal and Maniatis, 1999). NMR studies of this

RRM in complex with RNA indicate that the C residues, particu-

larly the first C of the CNNC, are bound in a base-specific

manner, with minimal preferences for the two intervening

bases (Hargous et al., 2006). Immunopurification of SRp20 and

9G8 confirmed that these two proteins (particularly SRp20)

were the ones that most efficiently crosslinked in our assay

(Figure 6C).

To evaluate SRp20 binding in vivo, we analyzed a large data

set of SRp20 crosslinking sites in P19 cells (Anko et al., 2012).

Although the published analyses of this data set focused on

sites within pre-mRNAs, we found that many SRp20 sites

resided in pri-miRNAs, and, more importantly, that these

sites overlapped the region of CNNC enrichment (Figure 6D).

This analysis extended our results from in vitro binding to

in vivo binding and from one pri-miRNA to many. Some of

the crosslinking sites in the CNNC-enriched region were in

pri-miRNAs that lacked a CNNC motif, suggesting that SRp20

(and presumably its paralog, 9G8) might play a role even

more general than that implied by CNNC conservation and

enrichment.

The requirement of SRp20 for cell viability (Jumaa et al., 1999;

Jia et al., 2010) confounded attempts to test its function by

depleting the protein in cell culture. Therefore, we tested its

function in vitro, supplementing immunopurified Microprocessor

complex with either immunopurified recombinant SRp20 (Fig-

ure S6) or an analogously purified control protein (EGFP).

SRp20 enhanced mir-16-1 processing in a CNNC-dependent

manner (Figure 6E). Taken together, our results indicate that

for many bilaterian miRNAs the CNNC motif is enriched and

preferentially conserved because it helps recruit SRp20

(or its homologs), which enhances pri-miRNA recognition and

processing.

Loop and Apical Stem Elements Can EnhanceProcessingTo examine whether additional processing features reside in

the loop and apical stem, we extended our approach to

those regions (Figure S7A). Pairing at the apical portion of the

stem contributed to pri-miRNA recognition and processing for

mir-125a and mir-30a, but not for mir-16-1 or mir-223 (Fig-

ure S7B), consistent with differing conclusions drawn from

studies of different miRNAs (Zeng et al., 2005; Han et al.,

2006). Primary-sequence preferences were weaker than those

observed for basal and flanking residues (Figure S7C). The

best candidate for a loop-binding motif was observed only in

mir-30a, in which the wild-type UGUG at positions P24–27

was both preferred in the selection (Figure S7D) and conserved

in vertebrate orthologs (Figure S7E). Human and zebrafish

miRNAs were enriched for UGU or GUG in this region of the

loop (empirical p < 10�5 for each species) (Figure S7F), thereby

confirming it as the third primary-sequence motif identified in

our study (Figure 7A).

Rescue of C. elegans miRNA Expression in Human CellsThe primary-sequence motifs important for mammalian miR-

NAs were not enriched in the nematode clade, suggesting

that their absence might account for the failure of C. elegans

pri-miRNAs to be processed in human cells. To test this idea,

we added the basal UG and the downstream CNNC motifs to

cel-mir-44 in the context of the mir-1 bicistronic vector (Fig-

ure 7B). Before adding the motifs, we disrupted the predicted

pairing between positions �14 and +12 and substituted the

G:C pair at positions �13 and +11 (construct mir44.1). These

changes, which were expected to simultaneously enhance

processing by shortening the basal stem to its optimal

length and inhibit processing by replacing the fortuitous G at

position �13, had a marginal net effect on production of mature

miR-44 in human cells (Figure 7B). Adding a basal UG

enhanced production of mature miR-44 by 5-fold (8-fold over

the wild-type), primarily from restoring the G at �13 (Figure 7B).

Adding a CNNC 17 nt downstream of the cleavage site

(mir44.4) enhanced production another 8-fold, yielding a

64-fold net increase over wild-type (Figure 7B). Similarly, con-

verting the wild-type, asymmetrically bulged stem of cel-mir-

50 to a regular, 11-pair stem and adding the UG and CNNC

motifs enhanced expression of mature miR-50 by 30-fold (Fig-

ure S7G), while adding the motifs to cel-mir-40 enhanced

expression of mature miR-40 by 5-fold (Figure S7H). We

conclude that primary-sequence motifs discovered in this study

help human cells to distinguish pri-miRNA hairpins from other

hairpins and that the absence of these motifs in C. elegans

pri-miRNAs helps to explain why human cells do not regard

these transcripts as pri-miRNAs.

DISCUSSION

Secondary structure is inadequate on its own to specify pri-

miRNA hairpins: primary-sequence features, including the basal

UG, the CNNC, and the apical GUG motifs, also contribute to

efficient processing in human cells (Figure 7A). Complicating

the story (and perhaps explaining why these primary-sequence

features had not been observed earlier), different pri-miRNAs

differentially benefit from the different motifs (Figure 7C). Among

human pri-miRNAs, these motifs were nonetheless highly en-

riched, with 79% of the conserved human miRNAs containing

at least one of the three motifs (Figure 7D).

The motifs were not enriched in C. elegans pri-miRNAs

(Figures 7E) and, when added to the C. elegans pri-miRNAs,

conferred more efficient processing in mammalian cells (Fig-

ure 7B, S7G, and S7H). These experiments also showed

the benefit of disrupting pairing normally present at positions

�14 and +12 of the C. elegans miRNAs. The presence of

pairing that is inhibitory to mammalian processing suggests

that measurement from the base of the helix might also differ

Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc. 853

Page 11: Beyond Secondary Structure: Primary-Sequence Determinants

A

C

D

E

Aver

age

info

rmat

ion

(bits

/nt)

Aver

age

info

rmat

ion

(bits

/nt)

hsa-mir-30a

0

0.1

0.2

0.3

0.4

0.5

Bas

al s

tem UG

CN

NC

Oth

er b

asal

posi

tions

Api

cal s

tem

Loop

GU

GO

ther

loop

posi

tions

hsa-mir-125a

0

0.1

0.2

0.3

0.4

0.5

Bas

al s

tem UG

Oth

er b

asal

posi

tions

Api

cal s

tem

Oth

er lo

oppo

sitio

ns

hsa-mir-16-1

0

0.1

0.2

0.3

0.4

0.5

Bas

al s

tem UG

CN

NC

Oth

er b

asal

posi

tions

Api

cal s

tem

Oth

er lo

oppo

sitio

ns

hsa-mir-223

0

0.1

0.2

0.3

0.4

0.5

Bas

al s

tem UG

CN

NC

Oth

er b

asal

posi

tions

Api

cal s

tem

Oth

er lo

oppo

sitio

ns

mir-44 mutant hsa-mir-1-1CMV Promoter TK pA

Querypri-miRNA ? ?

Controlpri-miRNA

Gppp AAAAA

B

Construct Basal Stem UG(–14) CNNCmir44.wt WT AG None

mir44.5 WT AG CNNC(+17)

mir44.1 -14 Mismatch AC Nonemir44.2 -14 Mismatch CG Nonemir44.3 -14 Mismatch UG Nonemir44.4 -14 Mismatch UG CNNC(+17)

70-60-50-40-30-

20-

wt 1 2 3 4

hsa-miR-1

70-60-50-40-

30-

20-cel-miR-44-3p

wt 5

1248

163264

128

wt 1 2 3 4miR

-44

rela

tive

expr

essi

on

mir44.wt CCUCUGAAA- AAU U - GU -- -- AA AGAGAA GGCCAA CUGGAUGUG CUC UGGUCAUA GACG UC \ UCUCUU CCGGUU GACUUACAC GAG AUCAGUAU UUGU AG C UAAGUUUUGA CGU C A -- AC CA CA

mir44.1CCUCUGAAA- A AAU U - GU -- -- AA

AG CAA GGCCAA CUGGAUGUG CUC UGGUCAUA GACG UC |UC GUU CCGGUU GACUUACAC GAG AUCAGUAU UUGU AG C

UAAGUUUUGA C CGU C A -- AC CA CA

mir44.2CCUCUGAAA- C AAU U - GU -- -- AA

AG GAA GGCCAA CUGGAUGUG CUC UGGUCAUA GACG UC |UC CUU CCGGUU GACUUACAC GAG AUCAGUAU UUGU AG C

UAAGUUUUGA U CGU C A -- AC CA CA

mir44.3CCUCUGAAA- U AAU U - GU -- -- AA

AG GAA GGCCAA CUGGAUGUG CUC UGGUCAUA GACG UC |UC CUU CCGGUU GACUUACAC GAG AUCAGUAU UUGU AG C

UAAGUUUUGA U CGU C A -- AC CA CA

mir44.4CCUCUGAAA- U AAU U - GU -- -- AA

AG GAA GGCCAA CUGGAUGUG CUC UGGUCAUA GACG UC |UC CUU CCGGUU GACUUACAC GAG AUCAGUAU UUGU AG C

UAAGCUUCGA U CGU C A -- AC CA CA

mir44.5 CCUCUGAAA- AAU U - GU -- -- AA AGAGAA GGCCAA CUGGAUGUG CUC UGGUCAUA GACG UC \ UCUCUU CCGGUU GACUUACAC GAG AUCAGUAU UUGU AG C UAAGCUUCGA CGU C A -- AC CA CA

cel-miR-44-3p probe binding site

C. elegans pri-miRNAs Chance

76%

9.3%

5.6% 1.9% 5.6%

CNNC(16–17) No motifUG(–14) Loop GUG(P22–P24)

Human pri-miRNAs Chance

33%

12%

9.9%

9.9%

4.5%

2.5%

7.4%

21% 65%

5.0%

17%

8.2%

2.2%

1.3%

0.6%

0.1%

79%

6.6%

0.5% 4.7%

0.4%0.7%

7.9%

Apical stemstructure

3p cleavage site5p cleavage site

CNNC

UGUG

UG

Basal stemstructure

Figure 7. Structural and Primary-Sequence Features Important for Human Pri-miRNA Processing

(A) Summary of human pri-miRNA determinants identified or confirmed in this study.

(B) Processing enhancement from adding human pri-miRNA features to C. elegans mir-44. Changes that introduced the listed features were incorporated

into mir-44 within the bicistronic expression vector (top). Secondary structures are shown for mutations predicted to affect the wild-type basal stem (bottom;

Drosha cleavage sites, purple arrowheads). After transfection into HEK293T cells, accumulation of miR-44-3p was assessed on RNA blots (middle), with

the graph plotting increased miR-44-3p expression normalized to that of the hsa-miR-1 control (geometric mean ± standard error, n = 3). Adding a CNNC to the

(legend continued on next page)

854 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

Page 12: Beyond Secondary Structure: Primary-Sequence Determinants

in nematodes. Thus, despite the many broadly conserved

features of miRNAs, some primary-sequence features and

some secondary-structure features differ in mammals and

nematodes.

About a fifth of human pri-miRNAs lack all three newly identi-

fied primary-sequence determinants (Figure 7D). These are

attractive subjects for further study, in that the approach imple-

mented here presumably would identify additional unique

determinants used by these pri-miRNAs. Other determinants

probably also exist at the Microprocessor cleavage site and

nearby stem regions, which were inaccessible to our approach

as implemented. Indeed, point mutations that disrupt pairing

in the middle of the stem dramatically impair processing (Gott-

wein et al., 2006; Duan et al., 2007; Jazdzewski et al., 2008;

Sun et al., 2009), and the SR-domain splicing factor SF2/ASF

is reported to enhance the processing of mir-7-1 by binding

a motif in the stem near the cleavage site (Wu et al., 2010).

Hinting at the possibility of additional primary-sequence prefer-

ences within the stem are results from both bacterial RNase III

and fungal homologs (Rnt1 and Pac1), which prefer specific

base-pair identities near the cleavage site (Lamontagne and

Elela, 2004).

The emerging picture is that pri-miRNA recognition is

a modular phenomenon in which each module contributes

modestly, and each pri-miRNA depends on individual modules

to varying degrees. Our results quantify the relative importance

of each known module for each pri-miRNA (Figure 7C). Pairing

within the basal stem was crucial, as expected (Lim et al.,

2003b; Han et al., 2006). In addition, all four miRNAs made use

of the basal UG motif, which provided information content

per nucleotide resembling that provided by the basal-stem

nucleotides. For the three miRNAs that used the CNNC

SRp20-binding site, its importance was also comparable

to that of the basal stem nucleotides. Compared to the nucleo-

tides within these motifs, other flanking nucleotides contributed

very little.

Apical and terminal loop elements were less important than

the basal motifs (Figure 7C). We detected significant contribu-

tions only in mir-125a, in which the apical stem nucleotides

were as important as the basal stem nucleotides, and in

mir-30a, in which the loop UGUG motif contributed some

information, albeit less than any of the three other features.

Together, the features described here explained 61%–78% of

the information content in the selected sequences. The remain-

ing information content was diffusely distributed among the

other partially randomized positions and might have mostly

reflected avoidance of detrimental alternative structures.

wild-type sequence (construct mir44.5) enhanced processing R 20-fold (geome

ground.

(C) Contributions of individual features to in vitro processingmeasured as average

shown.

(D) Enrichment of primary-sequence motifs in human pri-miRNAs conserved to m

basal UG, the apical GUG or UGU, or the downstream CNNC motif (left). Expect

of upstream, pre-miRNA, and downstream regions of human pri-miRNAs for the

(E) A search for human motifs in C. elegans pri-miRNAs (Table S2). Pri-miRNAs

analyzed pri-miRNAs.

See also Figure S7.

Knowledge of biogenesis features will aid in interpreting

human mutations. For example, reduced miR-16 expression

associated with chronic lymphocytic leukemia (CLL) is typically

due to deletions spanning the intron containing mir-15a and

mir-16-1 (Calin et al., 2002). However, 2 of 75 CLL patients

studied had tumors that retain the pri-miRNA hairpins and

instead carried a germline C > T single-nucleotide polymorphism

(SNP) downstream of the mir-16-1 hairpin (Calin et al.,

2005). This SNP lowers overexpression of miR-16 in HEK293

cells, and in both patients heterozygosity for the SNP was lost

in the leukemic cells (Calin et al., 2005). This SNP corresponds

to the first C in the mir-16-1 CNNC, which explains why it

lowers miR-16 accumulation and leads to CLL: it affects

pri-miRNA processing by disrupting SRp20 recruitment.

Discovery of additional features for pri-miRNA recognition and

processing might lead to improved diagnostic and therapeutic

tools in cancer and other diseases in which miRNAs are

dysregulated.

EXPERIMENTAL PROCEDURES

Ectopic Pri-miRNA Expression

Plasmids were derived from pcDNA3.2/V5-DEST and pMT-DEST (Invitrogen)

for expression in HEK293 and S2 cells, respectively. Query pri-miRNA

sequences and the human pri-mir-1-1 sequence were cloned such that the

query pri-miRNAs were transcriptionally fused upstream of mir-1-1. HEK293

and S2 cells were transfected using Lipofectamine 2000 and Cellfectin

(Invitrogen), respectively. After 36–48 hr, total RNA was extracted, and miRNA

expression was assayed by RNA blots, ribonuclease protection assays

(Invitrogen), and high-throughput sequencing (Chiang et al., 2010). For addi-

tional details including the data analysis pipeline, see Extended Experimental

Procedures.

Binding and Cleavage Assays

To assay binding, we radiolabeled and mixed T7-transcribed competitor and

reference pri-miRNA substrates in an equimolar ratio, then incubated them

with limiting amounts of immunopurified catalytically impairedMicroprocessor

(Lee and Kim, 2007; Han et al., 2009). RNA-protein complexes were filtered on

Immobilon-NC nitrocellulose discs (Whatman), and RNA extracted from the

filter was resolved on 5% polyacrylamide gels. To assay cleavage, we incu-

bated labeled substrates with Microprocessor lysate, which was prepared

from cells overexpressing Drosha and DGCR8 (Lee and Kim, 2007). After

extraction using Tri-Reagent (Ambion), substrates and products were resolved

on denaturing 5% polyacrylamide gels. For additional details, see Extended

Experimental Procedures.

Synthesis and Selection of Pri-miRNA Variants

Templates for T7 transcription were assembled from oligonucleotides (IDT)

synthesized using nucleoside phosphoramidite mixtures designed to intro-

duce variability at specified positions (Table S1). Sequences encoding the

HDV self-cleaving ribozyme were appended so that ribozyme cleavage would

tric mean of triplicate experiment), a lower bound set by the wild-type back-

information content per nucleotide. If available, results from two time points are

ouse (Table S2). Pri-miRNAs were classified based on whether they had the

ations by chance (right) were estimated based on the nucleotide composition

basal UG, apical GUG or UGU, and CNNC motifs, respectively.

were analyzed as in (D); the smaller diagrams reflect the smaller number of

Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc. 855

Page 13: Beyond Secondary Structure: Primary-Sequence Determinants

generate transcripts with defined 30 ends. Template pools were transcribed

using T7 RNA polymerase, and after treatment with TurboDNase (Ambion)

RNA was purified on denaturing polyacrylamide gels. After dephosphorylation

of 50 and 30 ends using calf intestinal phosphatase (NEB) and T4 polynucleotide

kinase (T4 PNK, NEB), followed by 50 phosphorylation using T4 PNK,

transcripts were circularized using T4 RNA ligase 1 (NEB) and gel purified.

RNA pools were incubated with Microprocessor lysate, and after gel purifica-

tion, cleavage products were ligated to oligonucleotide adaptors, reverse

transcribed, amplified, and Illumina sequenced (75 nt paired-end reads).

In parallel, the initial pool of RNA was also reverse transcribed, amplified,

and sequenced. Selections for examining binding or apical stem-loops

were similar, except transcripts were not circularized. For additional

details including the data analysis pipeline, see Extended Experimental

Procedures.

Motif Enrichment

Enrichment of a motif within pri-miRNAs of a species was evaluated by

comparing to 100,000 cohorts of miRNAs in which the upstream, down-

stream, and pre-miRNA sequences were independently shuffled, preserving

dinucleotide frequencies. The numbers of miRNAs that contained a match

to the motif in the actual and shuffled cohorts were used to compute an empir-

ical p value. A list of the representative pri-miRNAs used for analyses is

provided (Table S2). For additional details, see Extended Experimental

Procedures.

Site-Specific Crosslinking

The mir-30a pri-miRNA crosslinking substrate was assembled using T4 RNA

ligase 2 (NEB) and a DNA splint to join an in vitro-transcribed 50 fragment

to a synthetic 30 fragment containing a 30-terminal biotin and a 4-thiouridine

within the CNNC motif (Dharmacon). This crosslinking substrate was

incubated in Microprocessor lysate and exposed to 1000 mJ of 365 nm UV

light in a Stratalinker (Stratagene). For purification of RNA-protein complexes

for mass spectrometry, complexes were captured on streptavidin-coated

magnetic beads (Invitrogen), washed, and eluted with RNase T1 (Ambion),

which cleaves after G. Eluted complexes either were separated on SDS

gels and analyzed by HPLC/tandem mass spectrometry or were immuno-

precipitated and analyzed by SDS gel. For additional details, see Extended

Experimental Procedures.

ACCESSION NUMBERS

The Short Read Archive accession number for the sequencing data reported

in this paper is SRA051323.

SUPPLEMENTAL INFORMATION

Supplemental Information includes seven figures, two tables, and Extended

Experimental Procedures and can be found with this article online at http://

dx.doi.org/10.1016/j.cell.2013.01.031.

ACKNOWLEDGMENTS

We thank D. Shechner, C. Jan, O. Rissland, D. Weinberg, J. Ruby, J. Nam, and

V.N. Kim for valuable discussions; O. Rissland for comments on this manu-

script; L. Schoenfeld and J. Lassar for technical assistance; J. Stevenin for

9G8 antibody; V.N. Kim and T. Tuschl for plasmids; the Whitehead Institute

Genome Technology Core for sequencing; and E. Spooner for mass

spectrometry. This work was supported by NIH grants GM067031 and

T32GM007753. D.B. is an Investigator of the Howard Hughes Medical

Institute.

Received: April 10, 2012

Revised: October 28, 2012

Accepted: January 14, 2013

Published: February 14, 2013

856 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

REFERENCES

Anko, M.L., Muller-McNicoll, M., Brandl, H., Curk, T., Gorup, C., Henry, I., Ule,

J., and Neugebauer, K.M. (2012). The RNA-binding landscapes of two SR

proteins reveal unique functions and binding to diverse RNA classes. Genome

Biol. 13, R17.

Babiarz, J.E., Ruby, J.G., Wang, Y., Bartel, D.P., and Blelloch, R. (2008).

Mouse ES cells express endogenous shRNAs, siRNAs, and other Micropro-

cessor-independent, Dicer-dependent small RNAs. Genes Dev. 22, 2773–

2785.

Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and

function. Cell 116, 281–297.

Bedard, K.M., Daijogo, S., and Semler, B.L. (2007). A nucleo-cytoplasmic SR

protein functions in viral IRES-mediated translation initiation. EMBO J. 26,

459–467.

Bentwich, I., Avniel, A., Karov, Y., Aharonov, R., Gilad, S., Barad, O., Barzi-

lai, A., Einat, P., Einav, U., Meiri, E., et al. (2005). Identification of hundreds

of conserved and nonconserved human microRNAs. Nat. Genet. 37,

766–770.

Berezikov, E., van Tetering, G., Verheul, M., van de Belt, J., van Laake, L., Vos,

J., Verloop, R., van deWetering, M., Guryev, V., Takada, S., et al. (2006). Many

novel mammalian microRNA candidates identified by extensive cloning and

RAKE analysis. Genome Res. 16, 1289–1298.

Brummelkamp, T.R., Bernards, R., and Agami, R. (2002). A system for

stable expression of short interfering RNAs in mammalian cells. Science

296, 550–553.

Calin, G.A., Dumitru, C.D., Shimizu, M., Bichi, R., Zupo, S., Noch, E., Aldler, H.,

Rattan, S., Keating, M., Rai, K., et al. (2002). Frequent deletions and down-

regulation of micro- RNA genes miR15 andmiR16 at 13q14 in chronic lympho-

cytic leukemia. Proc. Natl. Acad. Sci. USA 99, 15524–15529.

Calin, G.A., Ferracin, M., Cimmino, A., Di Leva, G., Shimizu, M., Wojcik, S.E.,

Iorio, M.V., Visone, R., Sever, N.I., Fabbri, M., et al. (2005). A microRNA signa-

ture associated with prognosis and progression in chronic lymphocytic

leukemia. N. Engl. J. Med. 353, 1793–1801.

Cavaloc, Y., Popielarz, M., Fuchs, J.P., Gattoni, R., and Stevenin, J. (1994).

Characterization and cloning of the human splicing factor 9G8: a novel

35 kDa factor of the serine/arginine protein family. EMBO J. 13, 2639–

2649.

Cavaloc, Y., Bourgeois, C.F., Kister, L., and Stevenin, J. (1999). The splicing

factors 9G8 and SRp20 transactivate splicing through different and specific

enhancers. RNA 5, 468–483.

Cheloufi, S., Dos Santos, C.O., Chong, M.M., and Hannon, G.J. (2010).

A dicer-independent miRNA biogenesis pathway that requires Ago catalysis.

Nature 465, 584–589.

Chen, C.Z., Li, L., Lodish, H.F., and Bartel, D.P. (2004). MicroRNAs modulate

hematopoietic lineage differentiation. Science 303, 83–86.

Chiang, H.R., Schoenfeld, L.W., Ruby, J.G., Auyeung, V.C., Spies, N., Baek,

D., Johnston, W.K., Russ, C., Luo, S., Babiarz, J.E., et al. (2010). Mammalian

microRNAs: experimental evaluation of novel and previously annotated genes.

Genes Dev. 24, 992–1009.

Chung, W.J., Agius, P., Westholm, J.O., Chen, M., Okamura, K., Robine, N.,

Leslie, C.S., and Lai, E.C. (2011). Computational and experimental identifica-

tion of mirtrons in Drosophila melanogaster and Caenorhabditis elegans.

Genome Res. 21, 286–300.

Cifuentes, D., Xue, H., Taylor, D.W., Patnode, H., Mishima, Y., Cheloufi, S., Ma,

E., Mane, S., Hannon, G.J., Lawson, N.D., et al. (2010). A novel miRNA pro-

cessing pathway independent of Dicer requires Argonaute2 catalytic activity.

Science 328, 1694–1698.

Denli, A.M., Tops, B.B., Plasterk, R.H., Ketting, R.F., and Hannon, G.J. (2004).

Processing of primary microRNAs by the Microprocessor complex. Nature

432, 231–235.

Page 14: Beyond Secondary Structure: Primary-Sequence Determinants

Duan, R., Pak, C., and Jin, P. (2007). Single nucleotide polymorphism associ-

ated with mature miR-125a alters the processing of pri-miRNA. Hum. Mol.

Genet. 16, 1124–1131.

Fellmann, C., Zuber, J., McJunkin, K., Chang, K., Malone, C.D., Dickins, R.A.,

Xu, Q., Hengartner, M.O., Elledge, S.J., Hannon, G.J., and Lowe, S.W. (2011).

Functional identification of optimized RNAi triggers using a massively parallel

sensor assay. Mol. Cell 41, 733–746.

Feng, Y., Zhang, X., Song, Q., Li, T., and Zeng, Y. (2011). Drosha processing

controls the specificity and efficiency of global microRNA expression.

Biochim. Biophys. Acta 1809, 700–707.

Gottwein, E., Cai, X., and Cullen, B.R. (2006). A novel assay for viral microRNA

function identifies a single nucleotide polymorphism that affects Drosha

processing. J. Virol. 80, 5321–5326.

Gregory, R.I., Yan, K.P., Amuthan, G., Chendrimada, T., Doratotaj, B., Cooch,

N., and Shiekhattar, R. (2004). The Microprocessor complex mediates the

genesis of microRNAs. Nature 432, 235–240.

Grishok, A., Pasquinelli, A.E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D.L.,

Fire, A., Ruvkun, G., and Mello, C.C. (2001). Genes andmechanisms related to

RNA interference regulate expression of the small temporal RNAs that control

C. elegans developmental timing. Cell 106, 23–34.

Han, J., Lee, Y., Yeom, K.H., Kim, Y.K., Jin, H., and Kim, V.N. (2004). The

Drosha-DGCR8 complex in primary microRNA processing. Genes Dev. 18,

3016–3027.

Han, J., Lee, Y., Yeom, K.H., Nam, J.W., Heo, I., Rhee, J.K., Sohn, S.Y., Cho,

Y., Zhang, B.T., and Kim, V.N. (2006). Molecular basis for the recognition of

primary microRNAs by the Drosha-DGCR8 complex. Cell 125, 887–901.

Han, J., Pedersen, J.S., Kwon, S.C., Belair, C.D., Kim, Y.K., Yeom, K.H., Yang,

W.Y., Haussler, D., Blelloch, R., and Kim, V.N. (2009). Posttranscriptional

crossregulation between Drosha and DGCR8. Cell 136, 75–84.

Hargous, Y., Hautbergue, G.M., Tintaru, A.M., Skrisovska, L., Golovanov, A.P.,

Stevenin, J., Lian, L.Y., Wilson, S.A., and Allain, F.H. (2006). Molecular basis of

RNA recognition and TAP binding by the SR proteins SRp20 and 9G8. EMBO

J. 25, 5126–5137.

Heinrichs, V., and Baker, B.S. (1995). TheDrosophila SR protein RBP1 contrib-

utes to the regulation of doublesex alternative splicing by recognizing RBP1

RNA target sequences. EMBO J. 14, 3987–4000.

Hofacker, I.L., and Stadler, P.F. (2006). Memory efficient folding algorithms for

circular RNA secondary structures. Bioinformatics 22, 1172–1176.

Huang, Y., and Steitz, J.A. (2001). Splicing factors SRp20 and 9G8 promote the

nucleocytoplasmic export of mRNA. Mol. Cell 7, 899–905.

Hutvagner, G., McLachlan, J., Pasquinelli, A.E., Balint, E., Tuschl, T., and

Zamore, P.D. (2001). A cellular function for the RNA-interference enzyme

Dicer in the maturation of the let-7 small temporal RNA. Science 293,

834–838.

Jazdzewski, K., Murray, E.L., Franssila, K., Jarzab, B., Schoenberg, D.R., and

de la Chapelle, A. (2008). Common SNP in pre-miR-146a decreases mature

miR expression and predisposes to papillary thyroid carcinoma. Proc. Natl.

Acad. Sci. USA 105, 7269–7274.

Jia, R., Li, C., McCoy, J.P., Deng, C.X., and Zheng, Z.M. (2010). SRp20 is

a proto-oncogene critical for cell proliferation and tumor induction and

maintenance. Int. J. Biol. Sci. 6, 806–826.

Jumaa, H., Wei, G., and Nielsen, P.J. (1999). Blastocyst formation is blocked in

mouse embryos lacking the splicing factor SRp20. Curr. Biol. 9, 899–902.

Khvorova, A., Reynolds, A., and Jayasena, S.D. (2003). Functional siRNAs

and miRNAs exhibit strand bias. Cell 115, 209–216.

Lamontagne, B., and Elela, S.A. (2004). Evaluation of the RNA determinants

for bacterial and yeast RNase III binding and cleavage. J. Biol. Chem. 279,

2231–2241.

Landthaler, M., Yalcin, A., and Tuschl, T. (2004). The human DiGeorge

syndrome critical region gene 8 and its D. melanogaster homolog are required

for miRNA biogenesis. Curr. Biol. 14, 2162–2167.

Lee, Y., and Kim, V.N. (2007). In vitro and in vivo assays for the activity of

Drosha complex. Methods Enzymol. 427, 89–106.

Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P.,

Radmark, O., Kim, S., and Kim, V.N. (2003). The nuclear RNase III Drosha

initiates microRNA processing. Nature 425, 415–419.

Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B., and Bartel, D.P. (2003a).

Vertebrate microRNA genes. Science 299, 1540.

Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades,

M.W., Burge, C.B., and Bartel, D.P. (2003b). ThemicroRNAs ofCaenorhabditis

elegans. Genes Dev. 17, 991–1008.

Liu, J., Carmell, M.A., Rivas, F.V., Marsden, C.G., Thomson, J.M., Song, J.J.,

Hammond, S.M., Joshua-Tor, L., and Hannon, G.J. (2004). Argonaute2 is the

catalytic engine of mammalian RNAi. Science 305, 1437–1441.

Macrae, I.J., Zhou, K., Li, F., Repic, A., Brooks, A.N., Cande, W.Z., Adams,

P.D., and Doudna, J.A. (2006). Structural basis for double-stranded RNA

processing by Dicer. Science 311, 195–198.

Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. (2007). The

mirtron pathway generates microRNA-class regulatory RNAs in Drosophila.

Cell 130, 89–100.

Paddison, P.J., Caudy, A.A., Bernstein, E., Hannon, G.J., and Conklin, D.S.

(2002). Short hairpin RNAs (shRNAs) induce sequence-specific silencing in

mammalian cells. Genes Dev. 16, 948–958.

Pan, T., and Uhlenbeck, O.C. (1992). In vitro selection of RNAs that undergo

autolytic cleavage with Pb2+. Biochemistry 31, 3887–3895.

Park, J.E., Heo, I., Tian, Y., Simanshu, D.K., Chang, H., Jee, D., Patel, D.J., and

Kim, V.N. (2011). Dicer recognizes the 50 end of RNA for efficient and accurate

processing. Nature 475, 201–205.

Pitt, J.N., and Ferre-D’Amare, A.R. (2010). Rapid construction of empirical

RNA fitness landscapes. Science 330, 376–379.

Ruby, J.G., Jan, C.H., and Bartel, D.P. (2007). Intronic microRNA precursors

that bypass Drosha processing. Nature 448, 83–86.

Schaal, T.D., and Maniatis, T. (1999). Selection and characterization of pre-

mRNA splicing enhancers: identification of novel SR protein-specific enhancer

sequences. Mol. Cell. Biol. 19, 1705–1719.

Schwarz, D.S., Hutvagner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P.D.

(2003). Asymmetry in the assembly of the RNAi enzyme complex. Cell 115,

199–208.

Slattery, M., Riley, T., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T.,

Rohs, R., Honig, B., Bussemaker, H.J., and Mann, R.S. (2011). Cofactor

binding evokes latent differences in DNA binding specificity between Hox

proteins. Cell 147, 1270–1282.

Sun, G., Yan, J., Noltner, K., Feng, J., Li, H., Sarkis, D.A., Sommer, S.S., and

Rossi, J.J. (2009). SNPs in human miRNA genes affect biogenesis and func-

tion. RNA 15, 1640–1651.

Swartz, J.E., Bor, Y.C., Misawa, Y., Rekosh, D., and Hammarskjold, M.L.

(2007). The shuttling SR protein 9G8 plays a role in translation of unspliced

mRNA containing a constitutive transport element. J. Biol. Chem. 282,

19844–19853.

Wilson, D.S., and Szostak, J.W. (1999). In vitro selection of functional nucleic

acids. Annu. Rev. Biochem. 68, 611–647.

Wu, H., Sun, S., Tu, K., Gao, Y., Xie, B., Krainer, A.R., and Zhu, J. (2010).

A splicing-independent function of SF2/ASF in microRNA processing. Mol.

Cell 38, 67–77.

Wyatt, J.R., Sontheimer, E.J., and Steitz, J.A. (1992). Site-specific cross-

linking of mammalian U5 snRNP to the 50 splice site before the first step of

pre-mRNA splicing. Genes Dev. 6(12B), 2542–2553.

Yeom, K.H., Lee, Y., Han, J., Suh, M.R., and Kim, V.N. (2006). Characterization

of DGCR8/Pasha, the essential cofactor for Drosha in primarymiRNA process-

ing. Nucleic Acids Res. 34, 4622–4629.

Zahler, A.M., Neugebauer, K.M., Stolk, J.A., and Roth, M.B. (1993). Human

SR proteins and isolation of a cDNA encoding SRp75. Mol. Cell. Biol. 13,

4023–4028.

Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc. 857

Page 15: Beyond Secondary Structure: Primary-Sequence Determinants

Zeng, Y., and Cullen, B.R. (2005). Efficient processing of primary microRNA

hairpins by Drosha requires flanking nonstructured RNA sequences. J. Biol.

Chem. 280, 27595–27603.

Zeng, Y., Yi, R., and Cullen, B.R. (2005). Recognition and cleavage of primary

microRNA precursors by the nuclear processing enzyme Drosha. EMBO J. 24,

138–148.

858 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

Zhang, H., Kolb, F.A., Jaskiewicz, L., Westhof, E., and Filipowicz, W. (2004).

Single processing center models for human Dicer and bacterial RNase III.

Cell 118, 57–68.

Zykovich, A., Korf, I., and Segal, D.J. (2009). Bind-n-Seq: high-throughput

analysis of in vitro protein-DNA interactions using massively parallel

sequencing. Nucleic Acids Res. 37, e151.

Page 16: Beyond Secondary Structure: Primary-Sequence Determinants

Supplemental Information

EXTENDED EXPERIMENTAL PROCEDURES

Ectopic Pri-miRNA Expression in HEK293 Cells and S2 CellsA genomic fragment corresponding to the human mir-1-1 hairpin and flanking sequences was amplified and cloned into both

pcDNA3.2/V5-DEST (Invitrogen) and pMT-DEST (Invitrogen) expression plasmids downstream of the attR recombination sites.

Query pri-miRNA sequences were recombined into these plasmids at the attR sites using the Gateway system (Invitrogen). Expres-

sion plasmids and pMAX-GFP were co-transfected into HEK293 cells using Lipofectamine 2000 (Invitrogen) and co-transfected into

S2 cells using Cellfectin (Invitrogen) according to manufacturer’s instructions. After 36–48 hr, total RNA was collected by addition of

Tri-Reagent (Ambion) according to manufacturer’s instructions. RNA blots for detecting mature and pre-miRNAs were as described.

Ribonuclease protection assays were performed with the RPA III kit (Invitrogen) according to manufacturer’s instructions.

For detection of expression by sequencing, total RNA from individual transfections was combined and libraries for small-RNA

sequencing prepared as described (Chiang et al., 2010). Sequencing reads were mapped to a miRNA hairpin collection composed

of the miRBase-annotated hairpins of miRNAs endogenously expressed in the cell line and the miRBase-annotated hairpins of the

transfected miRNAs. Reads were included if they perfectly matched a hairpin in this library and excluded if they matched more than

one hairpin corresponding to a transfected miRNA. Read counts were normalized to the total reads matching a set of endogenous

hairpins that had no transfected counterparts. For each expressed pri-miRNA hairpin, number of reads reported is the number ob-

tained after subtracting the number observed in a normalized, mock-transfected control library.

Microprocessor LysateMicroprocessor lysate was prepared as described (Lee and Kim, 2007), with minor modifications. HEK293T cells were transfected

with a mixture of pCK-Drosha-FLAG (Lee and Kim, 2007) pFLAG-HA-DGCR8 (Landthaler et al., 2004), and a transfection-control

plasmid pMAX-GFP (Amaxa) using Lipofectamine 2000 (Invitrogen) according to the manufacturer’s instructions. After 72 hr, cells

were harvested by rinsing the monolayer in phosphate buffered saline (PBS, 137 mM NaCl, 2.7 mM KCl, 1.5 mM KH2PO4, 8 mM

Na2HP04, [pH 7.4]). Cells were pelleted, resuspended in sonication buffer (100 mM KCl, 0.2 mM EDTA, 20 mM Tris-Cl pH 8.0, and

0.7 ml/ml 2-mercaptoethanol) supplemented with mini-EDTA Free Protease Inhibitor tablets (Roche), and sonicated. After clearing

by centrifugation, cell lysis was confirmed by the liberation of GFP into the supernatant. The supernatant was distributed into

single-use aliquots, and stored in liquid-nitrogen vapor phase. pri-miRNA cleavage assays were carried out as described (Lee

and Kim, 2007) unless otherwise noted.

Competitive Binding and Cleavage AssaysThe competitive binding assay was based on that of Bartel, et al. (Bartel et al., 1991). T7-transcribed �200 nt pri-miRNA substrates

were gel-purified, treated with calf intestinal phosphatase (NEB), extracted in Tri-Reagent (Invitrogen), and 50 end-labeled using T4

Polynucleotide Kinase (NEB) and g-[32P]-ATP. Themir-125a reference substrate was prepared in the same way, except it was 10–25

nt shorter to enable separation on denaturing gels. Complexes containing Drosha-TN and DGCR8 were immunopurified fromMicro-

processor lysate in whichDroshaTN-FLAG replaced thewild-type Drosha plasmid as described (Lee andKim, 2007; Han et al., 2009).

Competitor and reference RNAs were mixed and incubated with Drosha-TN–DGCR8 for 15-30 min [final concentrations, 250 nM

each RNA, 100 mM KCl, 1 mMMgCl2, 0.2 mM EDTA, 20 mM Tris-Cl (pH 8.0), 0.7 ml/ml 2-mercaptoethanol and 300 ng/ml yeast total

RNA (Ambion)]. RNA-protein complexes were filtered on Immobilon-NC nitrocellulose discs (Millipore) and washed with at least 10

reaction volumes of sonication buffer. RNA was eluted from the membrane by incubating in elution buffer (300 mM NaCl, 8M urea,

and 25 mM EDTA) for 10 min at 85�C, ethanol precipitated and resolved on denaturing 5% polyacrylamide gels.

For competitive cleavage, 50 end-labeled query and reference pri-miRNA substrates were mixed and incubated with Micropro-

cessor lysate [final concentrations, 50 nM each RNA, 100 mM KCl, 1 mM MgCl2, 0.2 mM EDTA, 20 mM Tris-Cl (pH 8.0), 0.7 ml/ml

2-mercaptoethanol, 300 ng/ml yeast total RNA, 10 nM Microprocessor complex (concentration estimated exploiting the single-turn-

over behavior of the Microprocessor when cleaving linear pri-miR-125a)]. After incubation for 30 s at 37�C the reaction was stopped

by addition of Tri-Reagent (Ambion) with mixing. Extracted RNA was precipitated with isopropanol, then resuspended and resolved

on a denaturing 5% polyacrylamide gel.

Synthesis and Selection of Pri-miRNA VariantsTemplates for T7 RNA polymerase transcription were assembled from oligonucleotides (IDT) that were synthesized using nucleoside

phosphoramidite mixtures to introduce variability at specified positions (Table S1). For body labeling, transcription reactions included

a-[32P]-UTP.

Transcripts for producing circular pri-miRNA variants ended with a minimal HDV ribozyme (Schurer et al., 2002) that co-transcrip-

tionally self-cleaved at a defined position to produce homogenous 30 ends. After treatment with TurboDNase (Ambion), these tran-

scripts were gel-purified, treated with calf intestinal phosphatase (NEB) to remove the 50 triphosphate, extracted with Tri-Reagent,

precipitated with isopropanol, and treated with T4 polynucleotide kinase (NEB) to remove the 20-30 cyclic phosphate as described

(Guo et al., 2010). After ethanol precipitation, they were 50 phosphorylated with T4 polynucleotide kinase, diluted, and circularized

using T4 RNA ligase 1 (NEB). Circular pri-miRNAs were purified from linear species on denaturing polyacrylamide gels.

Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc. S1

Page 17: Beyond Secondary Structure: Primary-Sequence Determinants

Pools of variants were incubated inMicroprocessor lysate, and at one or two time points (for circularized pri-miRNA variants, 1 min

formir-125a, 1 and 4min formir-16-1, 1 and 5min formir-30a, and 3 and 15min for mir-223; for apical stem and loop variants, 5 s and

15 s formir-125a, 15 s and 2 min formir-16-1, 30 s and 2 min formir-30a, and 30 s and 2 min formir-223) reactions were stopped by

addition of Tri-Reagent (Ambion) with mixing, and cleaved products were purified on denaturing gels. Cleavage products of circu-

larized pri-miRNA variants were ligated to oligonucleotide adaptors containing barcode sequences using T4 DNA ligase (NEB)

and DNA splints (Table S1), reverse transcribed, and amplified. To represent the initial pools, a sample of phosphorylated, uncircu-

larized RNA was reverse transcribed and amplified. For the apical stem-loop variants, pre-miRNA cleavage products of linear pri-

miRNA variants were reverse transcribed, amplified and sequenced. To represent the initial pools of these variants, a sample

from each pool was reverse transcribed and amplified.

High-Throughput Sequencing and AnalysisAmplicons from the initial pools and the cleaved products were pooled for Illumina paired-end sequencing (75 nt reads per end) for

circularized substrate selections, and Illumina single-read sequencing (54 nt reads) for apical stem-loop selections. Sequencing

reads were divided into experimental groups according to constant sequences specific to each pri-miRNA and barcodes indicating

time points. After filtering for sequencing quality, discarding any sequences that had an error rate R 0.1 (phred score % 10) at any

variant position, the sequencing error averaged < 0.001 per variant position (average phred score > 30). Sequences in which the

length of a partially randomized region differed from that of the wild-type were also discarded, thereby eliminating many sequences

with insertions or deletions. Libraries were collapsed so that sequences that appeared multiple times with the same bar code were

considered just once in the analysis (although in retrospect this precaution was not required because there was no group of domi-

nant, multi-copy sequences that would have biased the analyses). Analyses were also restricted to products cleaved at the wild-type

processing sites, which were inferred from the dominant reads in small-RNA sequencing data (Landgraf et al., 2007; Bar et al., 2008;

Chiang et al., 2010; Witten et al., 2010), except for miR-16-1* and miR-223, which appear to undergo post-cleavage 30-end trimming

(Han et al., 2011).

To calculate the information content at each position, we used the data from the initial sequences and the product sequences to

calculate the relative cleavage of each base versus that of the other three bases. For example, for the A residue, the three relative

cleavage values are given below, whereP(N) is estimated by the frequency of a base in the initial pool, and P(Njcleavage) is estimated

by the frequency of that base in the product sequences.

PðcleavagejCÞPðcleavagejAÞ =

PðCjcleavageÞ=PðCÞPðAjcleavageÞ=PðAÞ

PðcleavagejGÞPðcleavagejAÞ =

PðGjcleavageÞ=PðGÞPðAjcleavageÞ=PðAÞ

PðcleavagejUÞPðcleavagejAÞ =

PðUjcleavageÞ=PðUÞPðAjcleavageÞ=PðAÞ

We then used Bayes’ Theorem (Pitman, 1993) to infer the nucleotide composition that would have resulted after selection from a pool

of variants in which there was an equal probability of an A, C, G, or U at this position. For example, the formula to infer the frequency of

A at a particular position after selection from such a pool was

Pinferred A =PðAjcleavageÞ=�1+

PðcleavagejCÞPðcleavagejAÞ +

PðcleavagejGÞPðcleavagejAÞ +

PðcleavagejUÞPðcleavagejAÞ

��1

The inferred post-selection distribution was then used to calculate information content scores for each nucleotide at each position.

For example, the information content for A at a particular position was calculated as

IA =Pinferred A 3 ½log2ðPinferred AÞ+ 2�

If results from two time points were available, information content values were averaged.

For evaluating motifs, we calculated a relative cleavage value based on the frequencies of the motif in the reference and selected

pools [P(motifi) and P(motifi)jcleavage), respectively], and the frequencies of a reference motif in the reference and selected pools

[P(motifref) and P(motifref)jcleavage), respectively].

Relative cleavage=PðmotifijcleavageÞ=PðmotifiÞ

PðmotifrefjcleavageÞ=PðmotifrefÞ

We also used an odds ratio score to calculate the enrichment for particular motifs by using the frequency of the motif in the reference

and selected pools [P(motifi) and P(motifi)jcleavage), respectively].

S2 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

Page 18: Beyond Secondary Structure: Primary-Sequence Determinants

Odds ratio=PðmotifijcleavageÞ=PðmotifiÞ

½1� PðmotifijcleavageÞ�=½1� PðmotifiÞ�

If two time points were available, the geometric mean of the ratios was reported, unless noted otherwise.

To screen specifically for Watson–Crick pairing between all possible combinations of randomized positions, we used a scoring

metric to compare the geometric average of odds ratios for Watson–Crick pairing to that of odds ratios for non-Watson–Crick pairs.

Pairing score=

YWatson�Crick

Odds ratio

!1=4

� Y

non�Watson�Crick

Odds ratio

!1=12

Pri-miRNA Collections and Positional Enrichments of Sequence MotifsA list of representative pri-miRNAs used for analyses is provided (Table S2). Because of the large number of questionable annotations

in miRBase (Chiang et al., 2010), analysis of human pri-miRNAs was restricted to those of miRNAs conserved in mouse. Coordinates

of miRNA loci in miRBase version 17 (Kozomara and Griffiths-Jones, 2011) were used to extract the sequences of each annotated

hairpin and 200 genomic bases flanking each side. miRBase hairpin sequences and flanking genomic sequences (20 nt on each side)

were folded using RNAfold (Hofacker and Stadler, 2006). The Microprocessor cleavage site was inferred using the predicted struc-

tures and the mature sequences annotated in miRBase. In cases in which the 30 overhang was shorter than 2 nt, the 30 product was

extended to generate a 2 nt overhang. Only hairpins for which the predicted folding and the annotated mature sequences could be

reconciled or extended to form a 2 nt 30 overhang were carried forward for analysis. For hairpins in miRBase-annotated miRNA fami-

lies, a single representative was chosen to represent the family in each species. For human, D. melanogaster, and C. elegans, the

family member with the most conserved pre-miRNA sequence (as determined by average branch-length score of pre-miRNA nucle-

otides) was chosen. For other species, the representative was chosen at random.

Whole-genome alignments and phylogenetic trees were obtained from the UCSC genome browser (Fujita et al., 2011). Conserva-

tion of a base was evaluated by its branch-length score, defined as the ratio between the total branch length of the species that

contained the same base as the reference sequence and the total branch length of the species that had an aligned base at that

position.

Enrichment of a motif at a set of positions relative to the cleavage site was computed by generating 100,000 cohorts of pri-miRNAs

in which the upstream, downstream, and pre-miRNA sequences were independently shuffled, preserving dinucleotide frequencies.

The numbers of miRNAs that contained a match to the motif in the actual and shuffled cohorts were used to compute an empirical

p value.

Analysis of Crosslinked ComplexesThemir-30a pri-miRNA crosslinking substrate was assembled using T4 RNA ligase 2 (NEB) and a DNA splint to join an in vitro-tran-

scribed 50 fragment to a synthetic 30 fragment containing a 30-terminal biotin and a 4-thiouridine within the CNNCmotif (Dharmacon).

This crosslinking substrate was incubated inMicroprocessor lysate and exposed to 1000mJ of 365 nmUV light in a Stratalinker (Stra-

tagene). For purification of RNA–protein complexes for mass spectrometry, complexes were captured on streptavidin-coated

magnetic beads (Invitrogen) and washed twice in Laemmli buffer (4% SDS, 20% glycerol, 125 mM Tris-Cl pH 6.8) and twice in

urea buffer (8 M urea, 300 mM NaCl, 25 mM EDTA), then eluted with RNase T1 (Ambion). The eluted complexes were separated

on SDS gels, and the corresponding gel slices excised. The complexes were reduced, alkylated and digested with trypsin. After

extraction and concentration, peptides were analyzed by HPLC/tandem mass spectrometry using a Waters NanoAcquity UPLC

system and a ThermoFisher LTQ linear ion trap mass spectrometer operated in a data-dependent manner. Peptides were identified

using SEQUEST and data analyzed with Scaffold (Proteome Software).

For immunoprecipitation, eluted RNA–protein complexes were incubated for 1 hr with antibody [either anti-FLAGM2 (Sigma), poly-

clonal mouse IgG (Millipore), anti-SRp20 (Invitrogen), or anti-9G8 (gift of J. Stevenin)], followed by incubation with protein-G agarose

beads (Sigma). After washing the beads three times in at least ten packed-bead volumes of sonication buffer, complexes were sepa-

rated on SDS gels.

Reanalysis of iCLIP DataSequencing reads from iCLIP of SRp20 (SRSF3) and SRp75 (SRSF4) were from ArrayExpress (accession ERP000815) (Anko et al.,

2012). Although that study did not find enrichment for SRp20-binding sites in miRNAs, the miRNA annotations examined did not

extend beyond the miRNA hairpins (i.e., the pre-miRNAs and their basal stems) and thus did not include downstream regions con-

taining the CNNC motif. After adaptor stripping, reads were mapped to the mouse genome using Bowtie (Langmead et al., 2009),

allowing for two mismatches and considering only uniquely mapped reads. Each position immediately 50 to an iCLIP read was

considered a crosslink site, and the number of crosslink sites was tallied for each relative distance from the mouse pre-miRNAs

confirmed or identified in Chiang et al. (2010). For pri-miRNAs with more than one site, the site supported by the most reads was

the one plotted in Figure 6D (distributing fractions of a count to each site in cases in which multiple sites were tied for the most

reads).

Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc. S3

Page 19: Beyond Secondary Structure: Primary-Sequence Determinants

Cleavage Assays with Purified SRp20SRp20 cDNA with an N-terminal 3X-FLAG tag was cloned into the pcDNA3.2/V5-DEST vector (Invitrogen). HEK293T cells were tran-

siently transfected with either the SRp20 construct or an analogous construct expressing N-terminally FLAG-tagged EGFP, using

Lipofectamine 2000 (Invitrogen) according tomanufacturer’s instructions. After 48 hr cells were lysed in sonication buffer. The tagged

proteins were immunoprecipitated using ANTI-FLAG M2 magnetic beads (Sigma), washed three times in sonication buffer, and

eluted with 150 ng/ul 3X-FLAG peptide (Sigma), then dialyzed against 1000 volumes of sonication buffer using dialysis membrane

with a 3 kDa cutoff (Pierce). For cleavage reactions, theMicroprocessor complex was first immunoprecipatated fromMicroprocessor

lysate. Biotinylated anti-FLAG-M2 antibody (1:500 dilution, Sigma) was incubated in lysate for 2.5 hr, then precipitated with

streptavidin-coated magnetic beads (Invitrogen) for 30 min. Beads were washed twice in sonication buffer, then incubated at

37�C with 50-labeled pri-mir-16-1 substrates and either SRp20 or EGFP immunopurified from HEK293T cells, at a final volume of

40% beads and 40% either SRp20, EGFP or sonication buffer. Final concentrations were 2 nM pri-miRNA, 100 mM KCl, 1 mM

MgCl2, 0.2 mM EDTA, 20 mM Tris-Cl (pH 8.0), 0.7 ml/ml 2-mercaptoethanol, 300 ng/ml yeast total RNA (Ambion) and 0.5%

SUPERaseIn RNase Inhibitor (Ambion). After 3 min, reactions were stopped in Tri-reagent (Ambion), and products separated on

10% denaturing polyacrylamide gels.

SUPPLEMENTAL REFERENCES

Anko, M.L., Muller-McNicoll, M., Brandl, H., Curk, T., Gorup, C., Henry, I., Ule, J., and Neugebauer, K.M. (2012). The RNA-binding landscapes of two SR proteins

reveal unique functions and binding to diverse RNA classes. Genome Biol. 13, R17.

Bar, M., Wyman, S.K., Fritz, B.R., Qi, J., Garg, K.S., Parkin, R.K., Kroh, E.M., Bendoraite, A., Mitchell, P.S., Nelson, A.M., et al. (2008). MicroRNA discovery and

profiling in human embryonic stem cells by deep sequencing of small RNA libraries. Stem Cells 26, 2496–2505.

Bartel, D.P., Zapp, M.L., Green, M.R., and Szostak, J.W. (1991). HIV-1 Rev regulation involves recognition of non-Watson-Crick base pairs in viral RNA. Cell 67,

529–536.

Chiang, H.R., Schoenfeld, L.W., Ruby, J.G., Auyeung, V.C., Spies, N., Baek, D., Johnston, W.K., Russ, C., Luo, S., Babiarz, J.E., et al. (2010). Mammalian micro-

RNAs: experimental evaluation of novel and previously annotated genes. Genes Dev. 24, 992–1009.

Fujita, P.A., Rhead, B., Zweig, A.S., Hinrichs, A.S., Karolchik, D., Cline, M.S., Goldman, M., Barber, G.P., Clawson, H., Coelho, A., et al. (2011). The UCSC

Genome Browser database: update 2011. Nucleic Acids Res. 39(Database issue), D876–D882.

Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. (2010). Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466, 835–840.

Han, J., Pedersen, J.S., Kwon, S.C., Belair, C.D., Kim, Y.K., Yeom, K.H., Yang, W.Y., Haussler, D., Blelloch, R., and Kim, V.N. (2009). Posttranscriptional cross-

regulation between Drosha and DGCR8. Cell 136, 75–84.

Han, B.W., Hung, J.H., Weng, Z., Zamore, P.D., and Ameres, S.L. (2011). The 30-to-50 exoribonuclease Nibbler shapes the 30 ends of microRNAs bound to

Drosophila Argonaute1. Curr Biol 21, 1878–1887.

Hofacker, I.L., and Stadler, P.F. (2006). Memory efficient folding algorithms for circular RNA secondary structures. Bioinformatics 22, 1172–1176.

Kozomara, A., and Griffiths-Jones, S. (2011). miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 39, D152–D157.

Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., Pfeffer, S., Rice, A., Kamphorst, A.O., Landthaler, M., et al. (2007). A mammalian microRNA

expression atlas based on small RNA library sequencing. Cell 129, 1401–1414.

Landthaler, M., Yalcin, A., and Tuschl, T. (2004). The human DiGeorge syndrome critical region gene 8 and Its D. melanogaster homolog are required for miRNA

biogenesis. Curr. Biol. 14, 2162–2167.

Langmead, B., Trapnell, C., Pop,M., and Salzberg, S.L. (2009). Ultrafast andmemory-efficient alignment of short DNA sequences to the human genome. Genome

Biol. 10, R25.

Lee, Y., and Kim, V.N. (2007). In vitro and in vivo assays for the activity of Drosha complex. Methods Enzymol. 427, 89–106.

Moore, M.J. (1999). Joining RNA molecules with T4 DNA ligase. Methods Mol. Biol. 118, 11–19.

Pitman, J. (1993). Probability (New York: Springer-Verlag).

Schurer, H., Lang, K., Schuster, J., andMorl, M. (2002). A universal method to produce in vitro transcripts with homogeneous 30 ends. Nucleic Acids Res. 30, e56.

Sontheimer, E.J. (1994). Site-specific RNA crosslinking with 4-thiouridine. Mol. Biol. Rep. 20, 35–44.

Witten, D., Tibshirani, R., Gu, S.G., Fire, A., and Lui, W.O. (2010). Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker

analysis in a collection of cervical tumours and matched controls. BMC Biol. 8, 58.

Zhang, X., and Zeng, Y. (2010). The terminal loop region controls microRNA processing by Drosha and Dicer. Nucleic Acids Res. 38, 7689–7697.

S4 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

Page 20: Beyond Secondary Structure: Primary-Sequence Determinants

Query pri-miRNA hsa-mir-1-1CMV Promoter TK pA

Querypri-miRNA ? ?

Control pri-miRNA

Gppp AAAAA

miR-1 mature

U6 snRNAGlu tRNA

mir-235

mir-1 mir-2a-1

mir-286

/4/5

let-7

mir-34mir-9

2a

mir-125

mir-279

mir-281

let-7a

mir-92a

mir-205

mir-17~

20

mir-125

a

mir-128

-1

mir-138

-2

mir-122

mir-133

a-1

mir-142

Fly miRNAs Human miRNAs

mir-1-1

(only)

mock1

mock2

lin-4

lsy-6

mir-40mir-5

0mir-2

40

let-7

mir-1 mir-2 mir-34mir-4

3/44

mir-46mir-5

9mir-6

0mir-1

24

pre-mir-1-1

pre-mir-1-1

miR-1 mature

U6 snRNA

Glu tRNA

Nematode miRNAs

Transfection:

A

B

15 fm

ol c

ontro

l

Cen

tury

Dec

ade

Moc

k

Tran

sfec

ted

hsa-mir-1-1

5%15

%

15 fm

ol c

ontro

l

Cen

tury

Dec

ade

Moc

k

Tran

sfec

ted

cel-lin-4

5%15

%

15 fm

ol c

ontro

l

Cen

tury

Dec

ade

Moc

k

Tran

sfec

ted

cel-lsy-6

5%15

%

15 fm

ol c

ontro

l

Cen

tury

Dec

ade

Moc

k

Tran

sfec

ted

cel-mir-40

5%15

%

15 fm

ol c

ontro

l

Cen

tury

Dec

ade

Moc

k

Tran

sfec

ted

cel-mir-50

5%15

%

Figure S1. Expression and Processing of Pri-miRNA Hairpins in HEK293 Cells, Related to Figure 1

(A) Expression of miR-1 and pre-mir-1-1 from bicistronic transcripts. HEK293 cells were individually transfected with plasmids bearing a human,D.melanogaster,

or C. elegans pri-miRNAs transcriptionally fused to human pri-mir-1-1. Mature miR-1 and pre-mir-1-1 derived from the transcriptional fusion were detected by

RNA blot. (Results from vectors in which let-7 and mir-1 were the query pri-miRNAs are shown here but are not shown in Figure 1A because the corresponding

mature miRNAs were indistinguishable from those of other transfected cells after total RNA was pooled for small-RNA sequencing.)

(B) Full membrane images for blots shown in Figure 1B. Total RNA was run on stacked polyacrylamide gels (5% top and 15% bottom) to resolve sizes from 20–

1000 nt. Each blot included marker lanes (Century and Decade RNA markers, Ambion) and a positive-control lane with 15 fmol in vitro transcribed standard

derived from the corresponding pri-miRNA (control).

Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc. S5

Page 21: Beyond Secondary Structure: Primary-Sequence Determinants

60-50-40-

30-

20-

hsa-miR-1

hsa-miR-125a

wt 1 2 3 4 5 6

0.25

0.5

1wt 1 2 3 4 5 6

Rel

ativ

e ex

pres

sion

mir-125amutants

? ?

Control miRNA(WT mir-125a)

*P

*P

0.03125

0.0625

0.125

0.25

0.5

1

wt 1 2 3 4 5 6

Rel

ativ

e cl

eava

ge

Competitive cleavagePrediction from mir-125a selection (1 mM Mg)

CAUGUU UCU GCCAG CUAGG CGGUC GGU

CCAACC UGC pre-miRNAmir125.wt

(wild-type)

CAUGAA UCU GCCAG CUAGG CGGUC GGU

CCAACC UGC pre-miRNAmir125.1

Basal stem

CAUGCC UCU GCCAG CUAGG CGGUC GGU

CCAACC UGC pre-miRNAmir125.2

CAUG UCU GGGCCAG CUAGG CCCGGUC GGU

CCAA UGC pre-miRNAmir125.3

CAUGUUA UCU CCAG CUAGG GGUC GGU

CCAACCC UGC pre-miRNAmir125.4

CAUGUUAA UCU CAG CUAGG GUC GGU

CCAACCCG UGC pre-miRNAmir125.5

CAUGUUAAG UCU AG CUAGG UC GGU

CCAACCCGG UGC pre-miRNAmir125.6

A B C

mir-125a mutant hsa-mir-1-1CMV Promoter TK pA

Querypri-miRNA ? ?

Controlpri-miRNA

Gppp AAAAA

Figure S2. Confirmation of hsa-mir-125a Selection Results In Vitro and in HEK293T Cells, Related to Figure 2

(A) Predicted basal stem structure of mir-125a variants tested in the experiment.

(B) Competitive cleavage of individual mir-125a variants, relative to wild-type mir-125a. Variants were mixed with wild-type mir-125a, which was longer at its 50

end, and incubated in Microprocessor lysate. Cleavage products were separated on denaturing gels, and the ratio of wild-type and variant products quantified

(blue, geometric mean ± standard error, n = 3), together with the relative cleavage inferred from the selection experiment (gray).

(C) Evaluation of mir-125a variants in HEK293T cells. Variants were transcriptionally fused to pri-mir-1-1 and expressed in HEK293T cells, as in Figure S1A.

Accumulation of mature miR-125a was quantified by RNA blot and normalized to the level of mature miR-1 (geometric mean ± standard error, n = 3).

S6 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

Page 22: Beyond Secondary Structure: Primary-Sequence Determinants

−1 −0.8

−0.6

−0.4

−0.2

0 0.2

0.4

0.6

0.8

1

Odds ratio (log2)

A

–0.48 0.55 –0.15 0.80

–1.25 –0.83 1.62 –0.89

–0.51 1.74 –0.16 0.78

0.86 –0.27 0.35 –0.37

A C G U

A

C

G

U

Position 10

Pos

ition

–12

Pair 10

–1.36 –1.00 –1.21 –0.05

–1.58 –1.10 –0.14 0.68

–1.44 0.75 –1.63 0.20

–0.48 –1.51 –1.25 –0.81

A C G U

A

C

G

U

Position 11

Pos

ition

–13

Pair 11

–0.59 –0.53 0.47 –1.08

–0.71 –0.70 –0.11 –1.25

–1.03 –1.11 –0.34 –1.59

0.64 0.76 1.82 0.13

A C G U

A

C

G

U

Position 12

Pos

ition

–14

Pair 12

A C G U

A

C

G

U

Pos

ition

–10

Position 8Pair 8

–2.44 –0.59 –2.89 0.33

–1.22 –1.78 0.22 –1.32

–1.47 0.00 –2.23 0.49

1.67 –1.69 –1.19 –0.98

A C G U

A

C

G

U

Pos

ition

–11

Position 9Pair 9

–1.82 –1.77 –0.91 0.24

–1.99 –2.18 0.69 –1.51

–0.94 1.00 –1.41 0.23

0.47 –0.21 0.12 –0.13

–2.26 –1.79 –1.39 0.69

–1.22 –1.08 1.73 –1.24

–0.90 1.64 –0.21 –0.21

1.26 0.23 1.93 0.45

A C G U

A

C

G

U

Pos

ition

–12

Position 11Pair 10

0.29 –0.2 –0.32 –0.06

–1.14 –1.57 0.99 –1.85

0.10 1.65 –0.16 –0.02

–0.08 –0.64 –0.97 –0.78

A C G U

A

C

G

U

Position 12

Pos

ition

–13

Pair 11

1.48 0.06 0.54 –0.67

1.03 –0.34 –1.01 –1.11

0.99 –0.94 –0.04 –1.31

1.79 1.61 1.48 0.73

A C G U

A

C

G

U

Pos

ition

–14

Position 13Pair 12

1

Can

dida

te p

airs

1

4

16

64

256

1024

0 0.5 1Threshold score

hsa-mir-16-1

1

4

16

64

256

1024

0 0.5 1Threshold score

hsa-mir-30a

1

4

16

64

256

1024

0 0.5 1Threshold score

hsa-mir-223

1

4

16

64

256

1024

0 0.5Threshold score

hsa-mir-125a

Basal stem

0.5

1

2

4

8

16

Rel

ativ

e cl

eava

ge

hsa-mir-223Alternative pairs

9 10 11 12 13Basal stem pairs

pre-miRNA

+1

–1

–10–11

–12–13–14

+8+9 +10+11+12+13

C U C-G G-U C-GA C C-G C-G G-C U-A G-C | A A-U C-G G-CU UC UC AU C

hsa-mir-223Wild-type basal stem

pre-miRNA

+1

–1

–10–11–12–13–14

–15

+8+9+10+11+12 +13

C U C-G G-U C-GA C C-G C-G G-C U-A G-C N-N N-N N-NN NN NN NN NN N

hsa-mir-223Alternative basal stem

B

C

Timepoint 1

Timepoint 2

Figure S3. Analysis of mir-223 Basal Stem Structure, Related to Figure 3

(A) Wild-type (left) and alternative (right) basal stem structures for hsa-mir-223. In the predicted structure of the wild-type the A at +10 is bulged, whereas in the

predicted structure of some of the variants the pairing shifts to place nucleotide +10within a contiguous helix. After sorting the selected variants based onwhether

or not their predicted secondary structures are consistent with shifted pairing, covariation matrices for both conformations were calculated as in Figure 3A.

(B) Relative cleavage of variants with different lengths of the alternative basal stem. Cleavage values were calculated as in Figure 3B and normalized to the 9 bp

stem.

(C) Screen forWatson–Crick pairs involving any two varied positions. For each of the > 3000 possible pairs, the degree ofWatson–Crick preferencewas evaluated

using a scoringmetric that compared the average odds ofWatson–Crick pairs to that of non-Watson–Crick alternatives. The number ofWatson–Crick candidates

is plotted as a function of threshold score, in which a pair is considered a Watson–Crick candidate if its score exceeds the threshold. The number of pairs

corresponding to the basal stem is shown (dashed line). In each case, the highest-scoring pairs were those of the basal stem. In the case ofmir-223, the highest

scoring pairs also included the alternative pairs that incorporated the bulged A at +10 into a contiguous helix. For each pri-miRNA, we inspected the next four

highest-scoring pairs, and in each case, the covariation matrix did not appear consistent with Watson–Crick pairing (data not shown).

Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc. S7

Page 23: Beyond Secondary Structure: Primary-Sequence Determinants

ppp

ppp

DroshaTNDGCR8

Nitrocellulosefiltration

Linear pri-miRNA substrate(pool of variants)

Functional variants

Library preparation for paired-end sequencing

Nonfunctional variants ppp

XX

Invariantresidues

–1

–14

+1

+12

–0.4

–0.2

0

0.2

0.4

0.6

0.8

1

1.2

–51

–49

–47

–45

–43

–41

–39

–37

–35

–33

–31

–29

–27

–25

–23

–21

–19

–17

–15

Info

rmat

ion

cont

ent (

bits

)

Stem-loop

13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45

A

B

hsa-mir-125a

Binding selection

hsa-mir-125a

Cleavage selection

–0.4

–0.2

0

0.2

0.4

0.6

0.8

1

1.2

–51

–49

–47

–45

–43

–41

–39

–37

–35

–33

–31

–29

–27

–25

–23

–21

–19

–17

–15

–13

–11 –9

Info

rmat

ion

cont

ent (

bits

)

9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45

Invariantresidues

Stem-loop

–1

–14

–9+1

+12+9

A C

G U

Figure S4. Selection for Microprocessor-Binding Variants of hsa-mir-125a, Related to Figure 4

(A) Schematic of the in vitro selection. Linear variants of mir-125a were incubated with immunopurified DGCR8 and catalytically inactive Drosha (DroshaTN).

Bound variants were recovered after nitrocellulose filtration, reverse-transcribed, and amplified for high-throughput sequencing.

(B) Information content after selection for Microprocessor binding. Information content after selection for cleavage (Figure 2D) is reproduced here for comparison.

The nucleotides varied in the initial pools are shown for each selection (insets, red inner lines).

S8 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

Page 24: Beyond Secondary Structure: Primary-Sequence Determinants

A B

hsa-mir-16-1

hsa-mir-30a

hsa-mir-223

All sequencesSequences without

wild-type CNNCTimepoint 1

Timepoint 2

Odd

s ra

tio o

f CN

NC

0.25

0.5

1

2

4

8

16

14 15 16 17 18 19 20 210.25

0.5

1

2

4

8

16

14 15 16 17 18 19 20 21

NA NA NA

Odd

s ra

tio o

f CN

NC

0.25

0.5

1

2

4

8

14 15 16 17 18 19 20 210.25

0.5

1

2

4

8

14 15 16 17 18 19 20 21

NA NA NA

Odd

s ra

tio o

f CN

NC

Position Position

0.25

0.5

1

2

4

8

14 15 16 17 18 19 20 210.25

0.5

1

2

4

8

14 15 16 17 18 19 20 21

NA NA NA

mir-30a mutant hsa-mir-1-1CMV Promoter TK pA

Querypri-miRNA ? ?

Controlpri-miRNA

Gppp AAAAA

Construct Basal Stem UG(–14) CNNC(+17)

mir30.wt WT UG CNNCmir30.1 –12 Paired UG CNNCmir30.2 WT AG CNNCmir30.3 WT UG CNNUmir30.4 WT UG UNNCmir30.5 WT UG UNNUmir30.6 WT AG UNNUmir30.7 7-pair stem UG CNNC

hsa-miR-1-1

70-60-50-40-

30-

20-hsa-miR-30a

wt 1 2 3 4 5 6 7

0.031

0.063

0.125

0.25

0.5

1

2wt 1 2 3 4 5 6 7

hsa-

miR

-30a

exp

ress

ion

UGUU A A A UC GUGAAG G CAGUG GCG CUGUAAACAUCC GACUGGAAGCU C C GUCAU CGU GACGUUUGUAGG CUGACUUUCGG C GGCU C C C -- GUAGACA

mir30.wt

UGUU A A UC UGUGAAG GCCAGUG GCG CUGUAAACAUCC GACUGGAAGC C CGGUCAU CGU GACGUUUGUAGG CUGACUUUCG C GGCU C C -- GGUAGACA

mir30.1

UGUUCCGC A A UC UGUGAAG GUG GCG CUGUAAACAUCC GACUGGAAGC C CAU CGU GACGUUUGUAGG CUGACUUUCG C GGCUCCGU C C -- GGUAGACA

mir30.7

hsa-miR-30a probe binding site

hsa-miR-1

U6tRNA-Glu

70-60-50-40-

30-

20-

mock

hsa-miR-16

wt 1 2 3

0.125

0.25

0.5

1

2wt 1 2 3

Rel

ativ

e m

iR-1

6 ex

pres

sion

Construct UG(–14) CNNC

mir16-1.wt UG CNNCmir16-1.1 AC CNNCmir16-1.2 UG UNNUmir16-1.3 AC UNNU

hsa-miR-1

wt 1 2 370-60-50-40-30-

20-hsa-miR-28

0.125

0.25

0.5

1

2wt 1 2 3

Rel

ativ

e m

iR-2

8 ex

pres

sion

Construct UG(–14) CNNC

mir28.wt UG CNNCmir28.1 CA CNNCmir28.2 UG ANNAmir28.3 CA ANNA

hsa-miR-1

70-60-50-40-

30-

20- hsa-miR-129

wt 1 2 3

0.125

0.25

0.5

1

2wt 1 2 3

Rel

ativ

e m

iR-1

29 e

xpre

ssio

n

Construct UG(–14) CNNC

mir129-2.wt UG CNNCmir129-2.1 AC CNNCmir129-2.2 UG ANNAmir129-2.3 AC ANNA

hsa-miR-193b

wt 1 2 370-60-50-40-

30-

20-

hsa-miR-1

0.125

0.25

0.5

1

2wt 1 2 3

Rel

ativ

e m

iR-1

93b

expr

essi

on

Construct UG(–14) CNNC

mir193b.wt UG CNNCmir193b.1 CA CNNCmir193b.2 UG UNNUmir193b.3 CA UNNU

C D E F

Figure S5. Contribution of the CNNC Motif In Vitro and in HEK293T Cells, Related to Figure 5

(A) CNNC odds ratios at alternative positions. Odds ratios were calculated for CNNC dinucleotides starting at the indicated positions downstream of the Drosha

cleavage site. Plotted are odds ratios for all sequences (left panels) and for sequences that lack both wild-type C residues (right panels).

(B) Contributions of the basal UG and downstream CNNC motifs to the accumulation of hsa-miR-30a in HEK293T cells. The listed variants of hsa-mir-30a were

transcriptionally fused to hsa-mir-1-1 (top). Predicted secondary structures for variants with non-wild-type structure are shown (center), with the annotated

Drosha cleavage sites (purple arrowheads). The accumulation of miR-30a was quantified by RNA blot, normalized to miR-1 (bottom, geometric mean ± standard

error, n = 3).

(C) Contributions of the basal UG and downstream CNNC motifs to the accumulation of hsa-miR-16 in HEK293T cells, otherwise as in (B).

(D) Contributions of the basal UG and downstream CNNC motifs to the accumulation of hsa-miR-28 in HEK293T cells, otherwise as in (B).

(E) Contributions of the basal UG and downstream CNNC motifs to the accumulation of hsa-miR-129 in HEK293T cells, otherwise as in (B).

(F) Contributions of the basal UG and downstream CNNC motifs to the accumulation of hsa-miR-193b in HEK293T cells, otherwise as in (B).

Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc. S9

Page 25: Beyond Secondary Structure: Primary-Sequence Determinants

Mic

ropr

oces

sor l

ysat

e

SR

p20

Lysa

te

Flow

-thro

ugh

Elu

tion

Anti-FLAGimmunoprecipitation

3XFLAG-SRp20SRp20

Figure S6. Immunopurified SRp20, Related to Figure 63X-FLAG-SRp20 was expressed in HEK293T cells, captured on anti-FLAGmagnetic beads, and eluted with 3X-FLAG peptide. Binding activity of immunopurified

SRp20 was measured by crosslinking to a mir-30a crosslinking substrate as in Figure 6A, except that the substrate contained only the mir-30a CNNC motif

(pCU(4-S-U)CAAGGG). For comparison, crosslinked complexes were generated from Microprocessor lysate (left).

S10 Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc.

Page 26: Beyond Secondary Structure: Primary-Sequence Determinants

H0.25

0.5

0

2

4

8

16

32

64

wt 1 2 3 4 5 6 7 8 9

cel-m

iR-5

0 re

lativ

e ex

pres

sion

mir50.1 11-pair stem AC None

Construct Basal Stem UG(–14) CNNCmir50.wt WT UG None

mir50.2 11-pair stem UC Nonemir50.3 11-pair stem UC CNNC(+19)mir50.4 11-pair stem UC CNNC(+18)mir50.5 11-pair stem UC CNNC(+17)mir50.6 11-pair stem UC CNNC(+16)mir50.7 11-pair stem UC CNNC(+15)

mir50.8 11-pair stem UG Nonemir50.9 11-pair stem UG CNNC(+18)

CUGUAUUCCU- C UCU G UUC CCCCGCCGGCCG UGAUAUGUCUGGUAU UGGGUUU AAC \ GGGGCGGCCGGC GCUAUGCAGAUUAUA GCCCAAG UUG C UCGGUAAU--AC A C-- - CGA

mir50.2

CUG - - C UCU G UUC UAUU CCUG CCCGCCGGCCG UGAUAUGUCUGGUAU UGGGUUU AAC \ GUAA GGAC GGGCGGCCGGC GCUAUGCAGAUUAUA GCCCAAG UUG C UCG U G A C-- - CGA

mir50.wt cel-miR-50 probe-binding site

hsa-miR-1

70-60-50-40-

30-

20-cel-miR-50

mir-50 mutant hsa-mir-1-1CMV Promoter TK pA

Querypri-miRNA ? ?

Controlpri-miRNA

Gppp AAAAA

wt 1 2 3 4 5 6 7 8 9

GUCUC CCU- -- C G A A A AUC CGAA CCUGU CCGCACCU AGU GAUGUAUGCC UG UGAUA GAU \ GUUU GGACA GGCGUGGA UCG CUACAUGUGG GC ACUAU CUA AA---- ACGU UG A A - C C AAG

GUCUC CCU- G -- C G A A A AUC CGAA U UGU CCGCACCU AGU GAUGUAUGCC UG UGAUA GAU \ GUUU G ACA GGCGUGGA UCG CUACAUGUGG GC ACUAU CUA AA---- ACGU G UG A A - C C AAG

GUCUC CCU- -- C G A A A AUC CGAA UGUGU CCGCACCU AGU GAUGUAUGCC UG UGAUA GAU \ GUUU ACACA GGCGUGGA UCG CUACAUGUGG GC ACUAU CUA AA---- ACGU UG A A - C C AAG

mir40.wt

mir40.1

mir40.2

cel-miR-40 probe-binding site

Construct Basal Stem UG(–14) CNNCmir40.wt WT CC Nonemir40.1 (mutant) UG Nonemir40.2 WT reverted UG Nonemir40.3 WT reverted UG CNNC(+16)mir40.4 WT reverted UG CNNC(+17)mir40.5 WT reverted UG CNNC(+18)mir40.6 WT reverted UG CNNC(+19)mir40.7 WT reverted UG CNNC(+20)

cel-miR-40

hsa-miR-1

70-60-50-40-

30-

20-

wt 1 2 3 4 5 6 7

0.5

1

2

4

8

wt 1 2 3 4 5 6 7

cel-m

iR-4

0 re

lativ

e e

xpre

ssio

n

mir-40 mutant hsa-mir-1-1CMV Promoter TK pA

Querypri-miRNA ? ?

Controlpri-miRNA

Gppp AAAAA

G

4

0

–4 chr672,113,290–72,113,300

(–)

A A G C T G T G A A GPhy

loP

verte

brat

e co

nser

vatio

n

A

C

D

E

F

B

Human UGU/GUG window

H. sapiensD. rerio

D. melanogasterA. gambii

D. pulexC. elegans

C. briggsaeP. pacificus

C. teletaL. gigantea

S. mediterranea

C. intestinalis

N. vectensis

P20

P21

P22

P23

P24

P25

P26

P27

P28

P29

P30

P31

Position

miRNAs with positioned UGU/GUG

0% 10%5% 20%15%

**

***

*

ppp

ppp

DroshaDGCR8

Linear pri-miRNA substrate(pool of variants)

Functional variants

Library forsingle-end sequencing

Nonfunctional variants

0.5

1

2

P20

P22

P24

P26

P28

P30O

dds

ratio

(WT

base

)

hsa-mir-30a

Timepoint 1 Timepoint 2

MaturemiRNA

Timepoint 1 Timepoint 2

0.1250.25

0.51248

163264

Rel

ativ

e cl

eava

ge

hsa-mir-125a

Apical stem pairs15 17 19 21 23

0.1250.25

0.51248

163264 hsa-mir-16-1

Apical stem pairs19 21 23 25

0.1250.25

0.51248

163264 hsa-mir-223

Apical stem pairs17 19 21 23 25

0.1250.25

0.51248

163264 hsa-mir-30a

Apical stem pairs17 19 21 23 25

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

P18

P20

P22

P24

P26

P28

P30

P32

P34

P36

P38

P40

P42

Info

rmat

ion

cont

ent (

bits

)

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2P

20P

22P

24P

26P

28P

30P

32P

34P

36P

38P

40P

42P

44

Info

rmat

ion

cont

ent (

bits

)

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

P20

P22

P24

P26

P28

P30

P32

P34

P36

P38

P40

P42

P44

Info

rmat

ion

cont

ent (

bits

)

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

P22

P24

P26

P28

P30

P32

P34

P36

P38

P40

P42

P44

P46

Info

rmat

ion

cont

ent (

bits

)hsa-mir-125a hsa-mir-16-1 hsa-mir-30a hsa-mir-223

P1

A C

G U

Figure S7. Selection of Functional Variants with Changes in the Apical Stem Loop and Rescued Processing of C. elegans Pri-miRNAs in

Human Cells, Related to Figure 7

(A) Schematic of the selection for functional pri-miRNA variants with changes in the apical stem and terminal loop. Linear pri-miRNA variants were incubated in

Microprocessor lysate, and cleaved pre-miRNA variants were gel-purified, reverse transcribed, and amplified for high-throughput sequencing.

(B) Relative cleavage of variants with different apical stem lengths. The number of contiguous Watson–Crick pairs was counted and the relative cleavage

calculated, normalized to that of the 15 bp stem. For each pri-miRNA, results are shown for both time points (key). For mir-125a, 22 bp above the 5p Drosha

cleavage site was strongly preferred; longer stemswere tolerated, whereas shorter stems were disfavored. Watson–Crick pairing throughout the apical stemwas

supported by analysis of covariation (data not shown). A 22-pair apical stem was also preferred, albeit more weakly, in mir-30a. By contrast, no preference for

apical pairing was observed in the stems of mir-16-1 or mir-223. Indeed, lengthening of the mir-16-1 apical stem at the expense of loop size was detrimental,

which was consistent with a previous report (Zhang and Zeng, 2010).

(C) Enrichment and depletion at variable residues in the apical stems and loops. At each varied position (inset, red inner line), information content was calculated

for each residue (green, cyan, black, and red for A, C, G, and U, respectively), as in Figure 2D.

(D) Relative cleavage ofmir-30a variants with the apical UGUGmotif beginning at the indicated positions, normalized to variants without the motif. Nucleotides of

the mature miRNA are shaded in yellow.

(E) Conservation of the region centered on the apical UGUG of mir-30a, otherwise as in Figure 4B.

(F) Enrichment for UGU or GUG trinucleotides in the terminal loops of metazoan pri-miRNAs (Table S2). For each species, pri-miRNA sequences were aligned on

the predicted Drosha cleavage site and occurrences of loop UGU or GUG trinucleotides tabulated. Species with a statistically significant enrichment within the

window are indicated (asterisk, empirical p value < 10�4). Although enrichment was observed in fish and insects, the lack of enrichment in several other

representative species raises the question of whether the usage of this motif arose independently in multiple lineages or was ancestral and lost multiple times.

(G) Effects of adding human pri-miRNA features to C. elegans mir-50. Changes that introduced the listed features were incorporated into mir-50 within the bi-

cistronic expression vector (left). Secondary structures are shown for changes that were predicted to affect the wild-type basal stem (middle; annotated Drosha

cleavage sites, purple arrowheads). After transfection into HEK293T cells, accumulation of miR-50 was assessed on RNA blots, normalizing to the accumulation

of the miR-1 control, and increased miR-50 expression is plotted (right; geometric mean ± standard error, n = 3).

(H) Effects of adding human pri-miRNA features to C. elegans mir-40, otherwise as in (G).

Cell 152, 844–858, February 14, 2013 ª2013 Elsevier Inc. S11