detecting rna modifications in the epitranscriptome ... · standing history, just like that of dna...

17
Perception of the field of RNA modifications had been in a near dormant state until a few years ago when inno- vations in detection techniques unlocked several tech- nology barriers, giving access to big data. On a par with genomics and transcriptomics in terms of volume and impact, these data quickly uncovered numerous new layers of information involved in the regulation of gene expression. The field of epigenetics is discovering an effect of RNA modifications that easily rivals that of modifications of proteins and DNA in terms of research activity and breakthrough results 1 . Consequently, the term ‘epitranscriptomics’ was coined 2–4 to comprise post-transcriptional alterations that do not affect the RNA sequence per se 5,6 . In quick succession, numerous completely unsuspected features were discovered that gave impetus to the field. In particular, the dynamic nature of modifications in many non-coding RNAs and in mRNAs gave rise to the idea that we are about to reveal a previously invisible code that resides in nucleic acids but is outside their sequence. The detection of RNA modifications has a long- standing history, just like that of DNA modifications, reaching back to the early days of molecular biol- ogy. Indeed, pseudouridine (Ψ), the most frequent post-transcriptional modification and therefore fre- quently called the fifth nucleoside 7 , was only discov- ered 2 years after the first modified DNA nucleoside, 5-methylcytosine (5 mC). Since then, a stream of newly detected modifications has provided 100–150 structures of naturally occurring and chemically distinct ribo- nucleosides 8 . The rapid speed of these advances is clearly because of technical progress in detection method- ologies, and there are no indications of an end to this development. Early discoveries relied on separation by chromatography and organic chemistry for structural elucidation, but were soon supplemented with electro- phoresis and mass spectrometry. This simultaneously drove progress in related areas such as sequencing and structural probing of nucleic acids. The richest source of modifications, both old and new, is transfer RNA (tRNA) with up to 25% of its nucleotides modified, which also features the largest chemical variety and complexity in its modifications 8,9 . Modifications range from simple methylations to sophis- ticated multistep transformations, including the incor- poration of a range of low-molecular-weight metabolites that are known from other pathways; a few examples are shown in FIG. 1a. The limited size and relative abundance of tRNA allowed the determination of a sequence context for these modifications, by chromatographic separation of small fragments after nuclease digestion or selective chemical cleavage 10 . Detection, identification and especi- ally determination of a sequence context of modifications in the much larger ribosomal RNAs (rRNAs) was conse- quently much harder and its progress was a consequence of the development of sequencing techniques using, for example, slab gel electrophoresis and capillary electro- phoresis 11 . In comparison with the high diversity and large number of modifications in these highly abundant non-coding RNAs, relatively few modifications were discovered (and most discoveries were serendipitous), in other, less-abundant species, including small nuclear RNAs (snRNAs), small nucleolar RNAs (snoRNAs) and microRNAs (miRNAs) 12–15 . Of note, mRNA was known for decades to contain various modifications such as N 6 -methyladenosine (m 6 A), 5-methylcytidine (m 5 C) and ribose methylations 15–18 . However, despite the obvious potential of mRNA modifications for regulating gene 1 Institute of Pharmacy and Biochemistry, Johannes Gutenberg University Mainz, Staudingerweg 5, 55128 Mainz, Germany. 2 IMoPA UMR7365 CNRS‑UL, BioPole Lorraine University, 9 avenue de la Foret de Haye, 54505 Vandoeuvre‑les‑ Nancy, France. mhelm@uni‑mainz.de; yuri.motorin@univ‑lorraine.fr doi:10.1038/nrg.2016.169 Published online 20 Feb 2017 RNA modifications Chemically altered (modified) nucleotides in mature RNA chain. Nucleotides The smallest structural units of a nucleic acid, composed of a nucleobase connected to a ribose (or deoxyribose) via a glycosidic bond, and a phosphate esterified to a hydroxyl group of the sugar moiety. The combination of only nucleobase and sugar is referred to as a nucleoside. Detecting RNA modifications in the epitranscriptome: predict and validate Mark Helm 1 and Yuri Motorin 2 Abstract | RNA modifications are emerging players in the field of post-transcriptional regulation of gene expression, and are attracting a comparable degree of research interest to DNA and histone modifications in the field of epigenetics. We now know of more than 150 RNA modifications and the true potential of a few of these is currently emerging as the consequence of a leap in detection technology, principally associated with high-throughput sequencing. This Review outlines the major developments in this field through a structured discussion of detection principles, lays out advantages and drawbacks of new high-throughput methods and presents conventional biophysical identification of modifications as meaningful ways for validation. STUDY DESIGNS NATURE REVIEWS | GENETICS VOLUME 18 | MAY 2017 | 275 REVIEWS ©2017MacmillanPublishersLimited,partofSpringerNature.Allrightsreserved.

Upload: others

Post on 14-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Perception of the field of RNA modifications had been in a near dormant state until a few years ago when inno-vations in detection techniques unlocked several tech-nology barriers, giving access to big data. On a par with genomics and transcriptomics in terms of volume and impact, these data quickly uncovered numerous new layers of information involved in the regulation of gene expression. The field of epigenetics is discovering an effect of RNA modifications that easily rivals that of modifications of proteins and DNA in terms of research activity and breakthrough results1. Consequently, the term ‘epitranscriptomics’ was coined2–4 to comprise post-transcriptional alterations that do not affect the RNA sequence per se5,6. In quick succession, numerous completely unsuspected features were discovered that gave impetus to the field. In particular, the dynamic nature of modifications in many non-coding RNAs and in mRNAs gave rise to the idea that we are about to reveal a previously invisible code that resides in nucleic acids but is outside their sequence.

    The detection of RNA modifications has a long-standing history, just like that of DNA modifications, reaching back to the early days of molecular biol-ogy. Indeed, pseudo uridine (Ψ), the most frequent post- transcriptional modification and therefore fre-quently called the fifth nucleoside7, was only discov-ered 2 years after the first modified DNA nucleoside, 5-methylcytosine (5 mC). Since then, a stream of newly detected modifications has provided 100–150 structures of naturally occurring and chemically distinct ribo-nucleosides8. The rapid speed of these advances is clearly because of technical progress in detection method-ologies, and there are no indications of an end to this development. Early discoveries relied on separation by

    chromatography and organic chemistry for structural elucidation, but were soon supplemented with electro-phoresis and mass spectrometry. This simultaneously drove progress in related areas such as sequencing and structural probing of nucleic acids.

    The richest source of modifications, both old and new, is transfer RNA (tRNA) with up to 25% of its nucleotides modified, which also features the largest chemical variety and complexity in its modifications8,9. Modifications range from simple methylations to sophis-ticated multi step transformations, including the incor-poration of a range of low-molecular-weight metabolites that are known from other pathways; a few examples are shown in FIG. 1a. The limited size and relative abundance of tRNA allowed the determination of a sequence context for these modifications, by chromatographic separation of small fragments after nuclease digestion or selective chemical cleavage10. Detection, identification and especi-ally determination of a sequence context of modifications in the much larger ribosomal RNAs (rRNAs) was conse-quently much harder and its progress was a consequence of the development of sequencing techniques using, for example, slab gel electrophoresis and capillary electro-phoresis11. In comparison with the high diversity and large number of modifications in these highly abundant non-coding RNAs, relatively few modifications were discovered (and most discoveries were serendipitous), in other, less-abundant species, including small nuclear RNAs (snRNAs), small nucleo lar RNAs (snoRNAs) and microRNAs (mi RNAs)12–15. Of note, mRNA was known for decades to contain vari ous modifications such as N6-methyladenosine (m6A), 5-methylcytidine (m5C) and ribose methylations15–18. However, despite the obvious potential of mRNA modifications for regulating gene

    1Institute of Pharmacy and Biochemistry, Johannes Gutenberg University Mainz, Staudingerweg 5, 55128 Mainz, Germany.2IMoPA UMR7365 CNRS‑UL, BioPole Lorraine University, 9 avenue de la Foret de Haye, 54505 Vandoeuvre‑les‑Nancy, France.mhelm@uni‑mainz.de;yuri.motorin@univ‑lorraine.fr

    doi:10.1038/nrg.2016.169Published online 20 Feb 2017

    RNA modificationsChemically altered (modified) nucleotides in mature RNA chain.

    NucleotidesThe smallest structural units of a nucleic acid, composed of a nucleobase connected to a ribose (or deoxyribose) via a glycosidic bond, and a phosphate esterified to a hydroxyl group of the sugar moiety. The combination of only nucleobase and sugar is referred to as a nucleoside.

    Detecting RNA modifications in the epitranscriptome: predict and validateMark Helm1 and Yuri Motorin2

    Abstract | RNA modifications are emerging players in the field of post-transcriptional regulation of gene expression, and are attracting a comparable degree of research interest to DNA and histone modifications in the field of epigenetics. We now know of more than 150 RNA modifications and the true potential of a few of these is currently emerging as the consequence of a leap in detection technology, principally associated with high-throughput sequencing. This Review outlines the major developments in this field through a structured discussion of detection principles, lays out advantages and drawbacks of new high-throughput methods and presents conventional biophysical identification of modifications as meaningful ways for validation.

    S T U DY D E S I G N S

    NATURE REVIEWS | GENETICS VOLUME 18 | MAY 2017 | 275

    REVIEWS

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

    mailto:[email protected]:[email protected]://dx.doi.org/10.1038/nrg.2016.169

  • Reverse transcriptionThe synthesis of single-stranded DNA (known as complementary DNA (cDNA)) by reverse transcriptase (an RNA- dependent DNA polymerase) using single-stranded RNA as a template.

    RT signatureA specific pattern composed of misincorporation in cDNA and reverse transcription arrest of primer extension, induced by modified residues in an RNA chain.

    expression, advancement of the field stalled at this stage until recent mapping of these and other modifications in mRNA by high-throughput sequencing techniques led to a resurgence in the field19,20. This effect was com-pounded by the discovery of RNA demethylation activ-ity of, for example, the fat mass and obesity-associated (FTO) protein21, which introduced an idea of dynamics of RNA modification levels and further emphasized the potential for regulation of gene expression.

    What were the problems of old and why has the field been revolutionized by techniques based on high-throughput sequencing? This Review addresses these questions by discussing established technology as well as recent breakthroughs. Importantly, whereas biophysical methods such as chromatography and mass spectrometry used to be the principal means for detection and quantification of RNA modifications on a case-by-case basis, current studies report data sets of hundreds to thousands of sites, predicted from high- throughput sequencing data to contain RNA modifications. As different high-throughput detection techniques are error prone to varying degrees, and because the field is seeing an ever-increasing number of published data sets, there is a realistic danger that an important number of published modification sites might be fraught with high uncertainty, which may not only diminish their value in further investigations but also damage the field as a whole. To avoid damage to biological interpretations, we posit that sites predicted from big data be treated as candidate sites to avoid widespread over-interpretation of modification data,

    until such time as a data set has been validated with at least one additional independent method. To this end, we discuss principles, advantages and weaknesses of methods based on high-throughput sequencing, as well as the possibilities for validation of candidate sites by chemical and biophysical methods.

    Sequencing RT signatures of RNA modificationsTraces of modifications left in cDNA. During reverse transcription (RT), modifications may disturb the faith-ful generation of the inverse carbon copy that is cDNA. Confronted with a non-canonical substrate, the enzyme may still incorporate the complementary deoxyribo-nucleo side triphosphate (dNTP) that would conform to Watson–Crick pairing with an unmodified template, as is the case for pseudouridine, ribothymidine (T) or m5C, which are RT-silent, and leave no or very faint traces in cDNA. However, larger modifications or modifications on the Watson–Crick face tend to lead either to RT arrest or to misincorporation of a non-complementary dNTP, both of which contribute to the overall RT signature. The RT-arrest case leaves information in the form of abor-tive cDNA ending at or near the site of modification and the misincorporation type of event manifests as apparent mutations in the cDNA sequences22–24 (FIG. 1b). Depending on the nature of the modified nucleo-tide, the proportions of correct cDNA synthesis, mis-incorporation and abortive products vary. For example, inosine (I), which is derived from A by deamination, base-pairs with C and leads to incorporation of dCTP, rather than dTTP, into cDNA. The faithfully resulting

    Nature Reviews | Genetics

    5′ 5′3′

    Specificchemical

    RNAm5C

    Cov

    erag

    e

    Cov

    erag

    e

    Positions Positions

    Misincorporationand coverage drop

    Coveragedrop

    N

    N

    N

    N

    R

    O

    CH3

    NH2

    N

    N

    R

    NH2

    O

    CH3

    NHNH

    R

    O

    O

    N

    N

    N

    N

    R

    NHCH3

    m1A m1G

    ψ

    NH

    N

    N

    N

    R

    O

    I

    N+

    N

    N

    N

    R

    NH2

    CH3

    a Chemical structures b Reverse transcription signature c Specific chemicals

    5′ 5′3′

    3′

    5′ 5′3′

    3′

    m1A

    m1A

    m

    Full-size cDNA

    cDNA with mutation

    Abortive cDNA

    5′ 5′3′

    3′3′

    5′ 5′

    5′

    3′

    3′

    ψ

    Full-size cDNA

    Abortive cDNA3′

    ψ

    m6A m5C

    Figure 1 | Reverse transcription-based techniques for detection of modified nucleotides. a | Chemical structures of common modified residues discussed in this Review: 1‑methyladenosine (m1A), 1‑methylguanosine (m1G), N6‑methyladenosine (m6A), 5‑methylcytidine (m5C), pseudouridine (Ψ) and inosine (I). Modifications are highlighted in pink. b | Components of reverse transcription (RT) signatures depend on the type of RNA modification. Top: RT‑silent residue (m5C); middle and bottom: RT‑blocking residue (m1A). Exemplary coverage profiles and corresponding misincorporation sites are shown in the bottom graph. c | Induction or enhancement of a RT signature using modification-specific chemicals.

    R E V I E W S

    276 | MAY 2017 | VOLUME 18 www.nature.com/nrg

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved. ©

    2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

  • misincorporation signal, which can be distinguished from an actual genomically encoded mutation by com-parison with genomic DNA, was the first to be investi-gated in several transcriptome-wide studies25,26 because of its ease of detection.

    The characteristics of RT signatures consisting of misincorporation and abortive cDNA products are not well studied for the vast majority of modified residues. Our recent investigations uncovered the consequences of 1-methyladenosine (m1A) in template RNA27,28 for cDNA synthesis and their use in machine-learning-based detection of respective patterns in high-throughput sequencing data. As comprehensive studies including both misincorporation and abortive RT events are not yet available for the large majority of modifications, their application to the detection of naturally occurring modifications is limited29. However, the RT-blocking and misincorporation properties of certain modifications are frequently used for detection, for example, in m1A mapping after antibody-mediated enrichment30,31 or in structural probing applications whereby modifications such as m1A and 3-methylcytidine (m3C) are artificially created by treatment with alkylating agents32,33.

    Enhancing RT signatures by chemical treatment. An additional dimension of information in high- throughput sequencing data is accessible after treatment of the RNA template with reagents that specifically react with modifications in a way that changes their RT signature in terms of either RT arrest or misincorporation. For example, inosine-specific cyanoethy lation by acrylo-nitrile (ICE-seq)34,35 generates strong RT stops, allow-ing real A to I conversion sites to be distinguished from simple sequencing errors. More recently, the well-known reaction specificity for Ψ residues36 of N-cyclohexyl-N ʹ-(2-morpholinoethyl)carbodiimide methyl- p-toluenesulfonate (CMCT), a water-soluble diimide, was successfully applied in several independent studies for pseudouridine mapping in yeast and human tran-scriptomes37–40 (FIG. 1c). After CMCT treatment of RNA, significant amounts of RT-arresting adducts occur at G, U and Ψ residues, but whereas G and U adducts can be hydrolysed during an additional mild alkaline treatment, the relevant adduct of pseudouridine resists hydrolysis under these conditions and therefore can be detected as RT-arrest signals.

    Another approach is the recent adaptation of bisulfite-based detection of m5C in RNA41–47. In the course of this technique, which was adapted from the well- established detection of 5mC (REF. 48) and 5-hydroxy-methylcytosine (5hmC) (REF. 49) in DNA, modification of cytosine to m5C or hm5C (REF. 50) (and also probably to m4C and ac4C) protects these sites from the effects of bisulfite treatment, whereas unmodified cytosines are deaminated to uridines. Guanosines in the resulting cDNA thus stem from modified cytidines, which allows facile detection in high-throughput sequencing data.

    Several further reagents with specificities for modifi-cations (shown in TABLE 1), known from decades of research, await their application to transcriptome-wide analyses51. An approach that has been used for the

    validation of methylated RNA immunoprecipitation (MeRIP) data on m1A modifications in mRNA makes use of the Dimroth rearrangement undergone by m1A under alkaline conditions, which results in the forma-tion of m6A and thus an alteration of the RT signa-ture upon treatment31. An interesting new approach is the use of enzymes with demethylating activity that alter the RT signature by removing a defined subset of modifications, thereby enabling their detection by differential analysis52–54.

    Modification calling, drawbacks and pitfalls. The term ‘modification calling’, is used in the field for the identi-fication of a position that is, according to the method in question, a good modification candidate. High-throughput sequencing-based modification calling in an entire transcriptome must inevitably face problems rooted in statistics. Even extremely well-defined RT sig-natures like that of inosine become less predictive when confronted with a strong background noise. The larger the query transcriptome, the higher the chances are that noise produces a signal that resembles that of a positive instance and thus leads to a false-positive identifica-tion. This problem seems to be particularly apparent in pseudouridine calling: an analysis of four carbodiimide- based transcriptome analyses termed Pseudo-seq39, Ψ-seq37, PSI-seq38 and CeU-seq40 found disturbingly little overlap55, although further studies might still reveal that this could have been due to different biological settings such as stress conditions.

    RNA bisulfite sequencing also suffers from noise, in particular from incomplete deamination of unmodified cytidines, thus producing false positives. This is a con-sequence of the reaction conditions for deamination, which must be carried out at less alkaline and thus less stringent conditions for RNA than for DNA, because RNA is prone to phosphodiester backbone hydrolysis at high pH. DNA contaminations, which survive longer under these conditions, must therefore be rigorously excluded. Excessive deamination and the concurrent degradation of RNA lead to insufficient coverage especi-ally of low-abundance RNA species. This effect is exacer-bated because deamination of cytidines reduces the information content of a given sequence (from ACGU to AGU only), and thus a bisulfite sequencing read needs to be correspondingly longer than a conventional read to allow reliable mapping. As with pseudouridine, pre-diction of m5C modification sites in abundant tRNA and rRNA is robust46, but modification calling in low- abundance RNAs is fraught with high error. For exam-ple, the vast majority of an initially reported >10,000 m5C sites43 outside tRNA and rRNA lack reproduction or confirmation by orthogonal approaches56,57.

    Modification calling requires the setting of a thresh-old, for example, 80% mapped reads were required to show non-conversion in bisulfite treatment43. Primary candidates emerging from threshold-guided mapping are typically scored by P values (BOX 1). However, given their nature as a tool to reject random noise signals, P values cannot account for bias resulting from either the mapping procedure or the experimental design. If read

    R E V I E W S

    NATURE REVIEWS | GENETICS VOLUME 18 | MAY 2017 | 277

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved. ©

    2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

  • mapping is carried out with multiple mapping allowed and/or at low stringency, misaligned reads may affect modification calling especially in repetitive sequence regions, as reported by Carlile et al.39. False modifica-tion calling instances may also arise from various mani-festations of experimental bias, such as inaccessibility and low reactivity of strongly structured RNA regions to reagents such as bisulfite or CMCT, which may partly or entirely negate the detection principle. RNA process-ing reactions by nuclease cleavage may lead to signals resembling an RT arrest event. Another error source could be incomplete characterization of the selectivity of a given reagent, as for example, bisulfite treatment is known to yield signals for modifications other than m5C (REFS 42,50). These artefacts must be addressed by revised experimental design, which in turn neces-sitates intimate knowledge of the detection principle. As the number of studies profiling RNA modifications increases, it is becoming ever clearer that at least a sub-set of called modification candidates must be verified by independent methods to ensure reliability1.

    Enriching RNA populations for sequencingCellular RNA is composed predominantly of abun-dant rRNA and tRNA species (>90%), depending on cell type. This means that an enrichment step before

    high-throughput analysis is generally highly beneficial, as otherwise the overwhelming majority of sequen-cing reads typically map to well-known rRNA and/or tRNA species. The most popular approach is to isolate the so-called poly(A)+ fraction, which is composed of mRNA and other polyadenylated RNAs, by affinity purification with poly(dT) oligonucleotides. However, the resulting fraction still contains some rRNA contam-ination, and many non-coding RNA species (such as snRNAs, snoRNAs and mi RNAs) are excluded from the analysis. Alternatively, or in addition, rRNA depletion protocols are available that remove the vast majority of rRNA mol ecules using complementary DNA oligo-nucleotides, albeit leaving tRNA contaminations that still affect final results.

    Role and effect of antibodies against modifications. Beyond these widespread and commercially supported protocols for poly(A)+ enrichment and/or rRNA deple-tion, more specific preselection procedures have been developed for modified nucleotide mapping, which rely on affinity-based enrichment of sequences containing a given modification.

    Except for highly modified eukaryotic tRNA species, populations of cellular RNAs generally contain an esti-mated 0.1–1% of modified nucleotides. Thus, specific

    Table 1 | Small-molecule reagents that are specific for RNA modifications

    Reagent Modification detected

    Reaction type Detection Refs

    Reagents applied to detection by high-throughput sequencing

    Bisulfite m5C Deamination m5C remains unconverted

    42–44

    N-cyclohexyl-Nʹ-β-(4‑methylmorpholinium) ethylcarbodiimide p‑tosylate (CMCT)

    Pseudouridine N-acylation RT arrest 36–39, 145

    Acrylonitrile Inosine N-alkylation RT arrest 35

    Methylthiosulfonate‑biotin (MTS‑biotin)

    s4U Disulfide formation

    Affinity enrichment 146

    Reagents not yet applied to detection by high-throughput sequencing

    Acid Wybutosine Depurination AICS; RT arrest 147,148

    Sodium borohydride m7G Reduction and depurination

    AICS; RT arrest 149,150

    Mild alkali treatment Dihydrouridine Cycle opening RT arrest 151

    Hydrazine m3C Nucleophilic addition

    AICS; RT arrest 149

    Isothiocyanate derivatives and NHS derivatives

    Primary amines N-alkylation Tagging 152

    Iodoacetamido‑ compounds Thiolated nucleotides including s2U and s4U

    S-alkylation Tagging 153

    Methylvinylsulfonate Pseudouridine N-alkylation Mass spectrometry 145

    Acrylonitrile Pseudouridine N-alkylation Mass spectrometry 145

    N-acryloyl-3-aminophenylboronic acid (APB)

    Queuosine Affinity: chelation by cis-diols

    Retardation in gel electrophoresis

    154

    Acrylo-aminophenylmercuric chloride (APM)

    Thiolated nucleotides, including s2U and s4U

    Affinity: mercury−sulfur

    Retardation in gel electrophoresis

    155

    AICS, aniline‑induced chain scission; m1G, 1‑methylguanosine; m3C, 3‑methylcytidine; m5C, 5‑methylcytidine; m7G, 7‑methylguanosine; NHS, N‑hydroxy‑succinimide; RT, reverse transcription; s2U, 2‑thiouridine; s4U, 4-thiouridine.

    R E V I E W S

    278 | MAY 2017 | VOLUME 18 www.nature.com/nrg

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved. ©

    2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

  • HaptensSmall, otherwise immunosilent molecules that elicit an immune response only when exposed to the immune system in the form of a covalent conjugate to a large carrier such as a protein.

    enrichment of modified fragments is necessary to reduce sequencing requirements and to map modified nucleo-tides to particular RNA regions. Antibodies are highly useful tools that are known for their potential to recognize molecular structures with very high affinity and specifi-city. Applications to nucleic acids include analytics of DNA modifications58,59, and efforts to raise specific antibodies against modified nucleotides in RNA have a long-standing history60–64. That history was low key until commercial antibodies enabled the enrichment of m6A-containing mRNA fragments65,66, the mapping of which revealed marked preferential occurrence in sequence elements that are crucial for mRNA translation, such as stop codons, splice sites and 3ʹ untranslated regions (3ʹ UTRs). Further antibodies are currently available against m1A (REFS 30,31), m5C (REF. 67) and hm5C (REF. 68).

    However, even for those antibodies that are efficient against the synthetic haptens used in their generation (typically an IO4− oxidized ribonucleoside coupled to a neutral protein such as bovine serum albumin (BSA) or keyhole limpet haemocyanin (KLH)62), many studies reported that specificity and affinity was poor for RNA mol ecules containing the same modified residue67. Arguably, the structural context of an RNA chain, which includes in particular multiple negative charges of the backbone phosphates, might be sufficiently different from the single modified nucleoside used to raise the antibody that binding and recognition of the antigen is substantially impeded in the context of an RNA chain. Thus, among the various antibodies of varied specifi-city available on the market, including multiple different antibodies directed against m6A, only a few are highly specific69. Although some are by now better character-ized70, for example, m6A- and m5C-targeted antibodies, the absence of rigorous routine characterization is keenly felt in the field.

    Regardless of potential biases, antibodies have clearly revolutionized the field in numerous m6A-specific MeRIP66 applications, which were developed by ana logy to methylated DNA immunoprecipitation (MeDIP) from the DNA modification field71. MeRIP-based

    mapping of m6A, m5C and m1A residues has revealed the presence and (approximate) location of these resi-dues in mRNA and other poorly characterized RNA species (FIG. 2a).

    All antibody-based enrichment techniques enable mapping of specific RNA modifications to a given region (generally 100 nucleotides or so in length), potentially containing modified residues. More precise localization is generally based on the search for the specific consen-sus sequence for the respective modification enzyme (if known) or on various crosslinking techniques that allow the detection window to be narrowed.

    Covalent crosslinking with cognate proteins. As anti-body binding provides only non-covalent complexes with modified RNA, the stringency of washing steps and thus enrichment is limited. A significant improvement was obtained by an ultraviolet (UV)-induced crosslink-ing step after formation of non-covalent complexes between modified RNAs and an antibody. In addition, covalent crosslinking of specific antibodies to modified sites in RNA leaves specific RT signatures in the corre-sponding sequencing profile, thus allowing approximate or fully single-nucleotide resolution, depending on the degree to which the antibody has been characterized69. Several vari ants of this technique have been proposed. For example, combining MeRIP with individual nucleo-tide resolution crosslinking and immunoprecipita-tion (iCLIP) results in m6A iCLIP (miCLIP)69, which uses the natural reactivity of RNA nucleobases excited at UV 254 nm. In addition, combining MeRIP with photoactivatable ribonucleoside-enhanced crosslink-ing and immunoprecipitation (PAR-CLIP) results in photo-crosslinking-assisted m6A sequencing strategy (PA-m6A-seq)72, which involves the prior incorporation of 4-thiouridine (s4U) into the RNA in vivo to increase the efficiency and specificity of the crosslink at UV 365 nm (FIG. 2b).

    Approaches similar to antibody crosslinking have also been developed for other proteins interacting with modifi cations, particularly the enzymes that them-selves catalyse the RNA modifications. Of note, simple crosslinks of a given modification enzyme are likely to reflect an enzyme’s affinity to the RNA, but not necessar-ily enzymatic turnover. In particular, tRNA-modifying enzymes such as yeast Pus1 (REF. 73) display similar affin-ities for substrate and non-substrate tRNAs. For exam-ple, iCLIP experiments with DnmA, a Dictyostelium discoideum homologue of the DNA methyltransferase 2 (DNMT2) tRNA m5C-methyltransferase (m5C-MTase), led to the isolation of U2 snRNA containing a tRNA-like sequence; however, this sequence was subsequently found not to be modified in vivo, as interrogated by bisulfite sequencing57,74. Actual substrates of m5C-MTases were caught by trapping the covalent intermedi-ate that is an obligatory part of the catalytic mechanism75. In certain m5C-MTases, one of two cysteine residues in the active site is required for resolving this intermediate and thus for enzyme release. Mutation of this so-called ‘regeneration’ cysteine residue leads to a suicide reaction and formation of an irreversible covalent bond with the

    Box 1 | Statistics in modification calling

    The most widespread parameter to gauge the statistical relevance of a presumed modification signal is the P value. P values are a quantitative measure for the probability of the so‑called null hypothesis, thus a probability that a given candidate is not called as a result of random noise. Although a P value below a threshold of 0.05 is frequently accepted as indicative of statistical significance, the imprudent application of such a threshold on a routine basis has been met with substantial criticism141. P values can be calculated using different statistical models, and a choice between models including Fisher’s exact test142, binomial distribution136,143 and hypergeometric distribution38 would presumably be guided by insight into the molecular basis for modification calling. In some cases, P values were further refined by the application of a Bonferroni correction to account for the number of tested positions136,144. In other cases the Benjamini–Hochberg correction for multiple hypothesis testing has been applied to improve false‑discovery rates (FDRs)136,142,144. Experimental approaches to improve P values aim to reduce noise using technical and biological replicates or to increase signals via an increase in coverage, either by augmenting the sequencing depth or by enrichment of RNA sequences containing modifications. Spike‑in controls of synthetic modified RNAs allow estimations of signal strength37 or error rates43.

    R E V I E W S

    NATURE REVIEWS | GENETICS VOLUME 18 | MAY 2017 | 279

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved. ©

    2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

  • target cyti dine residue. This technique was successfully applied for characterization of RNA substrates of the human NSUN2 and NSUN4 proteins50,76,77 (FIG. 2c).

    As a related concept, alteration of the RNA substrate can also stall the enzymatic reaction and lead to stable enzyme–RNA complexes that are suitable for pulldown

    and mapping. When 5-azacytidine (5-azaC), an inhibi-tor known to form irreversible covalent intermediates similar to those described above, is added to cell culture growth medium it is randomly incorporated into cellular RNA. Following 5-azaC incorporation, immunoprecipi-tation of m5C-MTases NSUN2, NSUN6 and DNMT2

    Protein

    Nature Reviews | Genetics

    Specific‘clickable’chemical

    AbortivecDNAs

    3′5′5′

    3′

    5′3′

    5′3′

    5′3′

    5′3′RNA

    5′3′RNA

    5′3′RNA

    5′3′RNA

    5′3′

    5′3′

    5′3′

    N3

    N3

    Bi

    BiAlkyne-biotin

    Bi

    AbortivecDNAsAbortive

    cDNAs

    5′ 3′5′ 3′

    5′3′

    Antibody

    Antibody

    5′3′

    Librarypreparation

    andsequencing

    3′5′

    3′5′

    3′5′

    Target site

    5′3′RNA

    Target site

    Protein5′

    3′RNA

    Protein

    5′3′

    Protein

    5-azaCVariant

    5-azaC

    3′5′

    Librarypreparation

    andsequencing

    Librarypreparation

    andsequencing

    Primerextension

    Proteindigestion

    Suicidaltrap

    Suicidaltrap

    Primerextension

    Antibodyselection

    UV-lightcrosslink

    Randomfragmentation

    IsolationRNA

    fragments

    Antibodyselection

    Randomfragmentation

    Librarypreparation

    andsequencing

    Avidinpull-out

    ‘Click’reaction

    Randomfragmentation

    Specificmodification

    RNA RNA

    a MeRIP c Suicide enzyme trap d Clickable chemicalsb MeRIP-iCLIP

    N

    N

    N

    R

    NH2

    O

    NNH

    NH2

    SE

    NR

    O

    Figure 2 | Enrichment strategies in analysis of RNA modifications. a | Methylated RNA immunoprecipitation (MeRIP). RNA is fragmented into pieces of 100–150 nucleotides, modification-containing fragments are enriched by antibody (Ab) pulldown and they are converted into a  sequencing library. b   |  Combination of MeRIP and individual‑ nucleotide resolution crosslinking and immunoprecipitation (iCLIP) or photoactivatable- ribonucleoside-enhanced crosslinking and immuno-precipitation (PAR‑CLIP). After fragmentation, RNA is crosslinked at reactive residues (indicated by red bars) with a specific Ab at either 254 nm ultraviolet (UV) light and natural RNA (iCLIP) or at 365 nm UV light and s4U residues incorporated into RNA (PAR‑CLIP). The Ab is then removed and a reverse transcription (RT) profile reveals misincorporation and stops corresponding

    to crosslinked residues. c | An active site‑mutated modification enzyme makes a covalent link to its cognate target nucleotide in RNA. Protein is then removed by proteinase K digestion and the crosslink position is determined by primer extension and sequencing. In a related technique, inclusion of 5‑azacytidine (5‑azaC) in the RNA substrate causes the enzyme to make an irreversible covalent link to the target cytosine (boxed). d | Selection of modified RNAs (or fragments) using modification‑specific chemicals that are ‘clickable’ or biotinylated for affinity purification. The method uses a  erivative of a modification‑specific reagent containing biotin or a reactive group to which biotin can be selectively attached in a secondary reaction. Enrichment of biotin-modified fragments is effected with avidin beads, followed by library preparation and sequencing.

    R E V I E W S

    280 | MAY 2017 | VOLUME 18 www.nature.com/nrg

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved. ©

    2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

  • PulldownA technique for selective isolation of RNA (or protein) species from a complex biological sample. It generally relies on affinity capture with antibodies or another high-affinity interaction.

    Click chemistryA class of biocompatible (bio-orthogonal) reactions, generally involving azide and alkyne moieties.

    BarcodesShort specific sequences (often six to eight nucleotides) that are incorporated into sequencing adaptors and ligated to samples. When barcode sequences are shared within a sample but differ between samples, they allow unambiguous retrospective assignment of samples run in the same sequencing lane. When barcodes are degenerate and highly variable within a sample, they are known as unique molecular identifiers (UMIs) and are used to identify redundant, PCR-amplified reads that originate from the same reverse transcription event. Adaptors may combinatorially contain both sample-specific barcodes and UMIs.

    CircLigationCircularization of single-stranded cDNA templates having a 5ʹ phosphate and a 3ʹ hydroxyl group with CircLigase.

    and then high-throughput sequencing of the attached RNA has been used to identify RNAs modified by these MTases78–80 (FIG. 2c). Interestingly, the sequences obtained showed frequent nucleotide mis incorporations at the site of crosslink, which is a phenomenon known from CLIP experiments with s4U (REF. 81), as well as from crosslink-ing with certain m6A-targeted anti bodies69. These mis incorporation sites point out the neighbouring nucleotides around a modification site and thus pro-vide approximate single-nucleotide resolution. Of note, formation of similar covalent complexes between certain pseudo uridine synthases and 5-fluorouridine- containing RNA substrates was also observed in the past, but has not yet been explored for coupling to high-throughput sequencing82,83.

    Clickable or biotinylated specific chemicals. Small-molecule reagents present a viable alternative to anti-bodies for affinity-based enrichment of modified RNA fragments. In contrast to antibody-based methods, but similar to CLIP-type approaches, they selectively form covalent adducts with modified residues. For subsequent purification, an affinity tag such as biotin or a func-tional group for the conjugation of an affinity tag are required. In contrast to antibodies, which have a recog-nition domain for non-covalent interaction that is con-siderably larger than the modified residues themselves, small- molecule reagents achieve selective enrichment via differential chemical reactivity. This feature typically resides in the electronic property of the nucleobases, thus representing a fundamentally different concept for dis-crimination. Successful applications of small-molecule reagents include the tagging of nicotinamide residues at the 5ʹ end of bacterial RNA84 and an interesting improve-ment of the CMCT-based pseudouridine detection40, in which the use of an azide-containing CMCT derivative in so-called CeU-seq allowed subsequent attachment of an alkynylated biotin via click chemistry and a subsequent enrichment of pseudouridine-containing RNAs before high-throughput sequencing analysis (FIG. 2d). This study, which also included validations of several new sites in rRNA and mRNA by the site-specific cleavage and radioactive labelling followed by ligation-assisted extraction and thin-layer chromatography (SCARLET) method (see below), reported 4–5 times more sites than each of the other CMCT-based approaches.

    Sequencing library preparation methodsHigh-throughput sequencing using current technologies typically requires the attachment of particular sequences to the nucleic acid molecules destined to be sequenced. The protocol by which the addition of these so-called adaptors is implemented is generally called library preparation. In the case of RNA analysis, library prepa-ration consists of conversion into double-stranded DNA (dsDNA) using RT and second strand synthesis, con-comitant with attachment of appropriate adaptors, which may include primer binding sites for later amplification, as well as barcodes. Barcodes are short 6 or 8 nucleotide sequences that allow sample multi plexing: a mixture of several samples can be run in the same sequencing lane,

    then the obtained reads are sorted according to barcode and attributed to a given sample. Another common meaning for barcode, also known as unique molecular identifier (UMI)85, is a sequence of random nucleotides (designated, for example, N6 for six consecutive degen-erate positions) that aids the bioinformatic analysis by allowing multiple redundant copies of the same cDNA to be identified and eliminated.

    The choice of a library preparation protocol depends on the sequencing method used for RNA modification detection. RT priming requires either 3ʹ adaptor ligation to RNA or random N6 priming, depending on the length of RNA to be analysed (FIG. 3A). The recent application of thermostable group II intron reverse transcriptases (TGIRTs) that are capable of template switching for sequencing library preparations represents an attractive ‘ligation-free’ alternative to existing protocols and, in addition, was reported to show lower sequence bias86,87 (FIG. 3A).

    All RT-based techniques applied to long RNAs, with or without previous chemical modification, require single- base precision in mapping of the 3ʹ end of the cDNA (where the cDNA synthesis aborted opposite the modi-fication site) and thus only a few popular protocols used for gene expression analysis are suitable for this purpose (FIG. 3B). The first option is a direct single-stranded liga-tion of a 3ʹ-blocked DNA oligonucleotide (FIG. 3Ba) or 5ʹ-preadenylated and 3ʹ-blocked RNA oligonucleotide (FIG. 3Bb) to the cDNA 3ʹ end, using RNA ligase37. Another established approach is based on GG (or CC) tailing of the cDNA 3ʹ end, followed by ligation of a DNA duplex with a corresponding CC (or GG) overhang, for second strand synthesis27,28 (FIG. 3Bc). Another approach uses the so-called CircLigation protocol initially developed for ribo-some profiling88 (FIG. 3Bd). Finally, an alternative method is the templated extension of the cDNA 3ʹ end with a ran-dom (N6) terminal extension oligonucleotide89 (known as terminal-tagging oligonucleotide (TTO)) (FIG. 3Be). In all cases, these steps are followed by PCR amplification, typically including barcoding.

    Approaches that do not require single-base precision include MeRIP protocols, which ignore abortive cDNA, or approaches of the ribose methylation sequencing (RiboMethSeq) type90,91 (FIG. 3C), in which the precise definition of fragments resides in the 5ʹ and 3ʹ adaptor ligation protocols. Both published RiboMethSeq proto-cols are based on reduced cleavage of the RNA phos-phodiester bond that is 3ʹ adjacent to a 2ʹ-O-Me residue in RNA. Random alkaline cleavage produces a collection of fragments starting and ending at all RNA positions, except at position +1 to the 2ʹ-O-Me residue. Fragments are thus converted into the library, sequenced and a 5ʹ end (or cumulative 5ʹ and 3ʹ end) counting profile is used for detection of modification. In these cases, stand-ard library preparation protocols already developed for small RNA (typically miRNA) analysis are appropri-ate90, or non-conventional ligation techniques have been invented91. Ribose 2ʹ-O-methylation in yeast rRNA was quantified by both published RiboMethSeq approaches and compared with the data obtained by direct rRNA analysis by mass spectrometry92. Even though exact

    R E V I E W S

    NATURE REVIEWS | GENETICS VOLUME 18 | MAY 2017 | 281

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved. ©

    2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

  • Nature Reviews | Genetics

    5′

    Problem: precise mapping of RT stop

    5′3′

    5′3′

    5′

    5′AbortivecDNA

    AbortivecDNA

    Abortive cDNA

    AbortivecDNA

    5′

    NNNNNN

    Priming at the RNA 3′ end

    Template switch by TGIRT (Seq)

    5′ 3′App X

    X

    Primer ligation

    Direct primer extension

    Primerextension

    cDNA3′ end

    GG

    Ba Bb

    Bc BeBd

    NN-tailing

    CircLigation protocol

    Templated cDNAextension

    Barcodedamplicon

    Amplification and barcoding

    Amplificationand barcoding

    Amplificationand barcoding

    Amplification and barcoding PCR andbarcoding

    di-tagged cDNA

    Circular cDNA

    5′

    5′3′

    3′

    5′

    3′

    3′

    dsDNAligation

    CircLigation

    CC

    CCGG

    CCGG

    5′ NNNNNN3′

    5′3′

    TTOannealing

    5′

    5′

    5′3′

    3′NNNNNN5′5′

    5′

    3′

    3′

    5′

    5′AbortivecDNA

    N

    3′ 3′ SpC3

    Short RNAs (adapter ligation) Long RNAs (random priming)

    3′5′

    DNA oligonucleotidewith 3′ N overhang

    R2 RNAwith 3′ blocker

    N3′ 3′ SpC3

    N3′ SpC3

    DNA with3′ blocker3′ ddC

    R1 RNA with3′ blockerppA-5′3′ ddC

    P 5′

    Templatedextension

    Primer extensionSecond strandsynthesis

    5′- and 3′-ligation bias(es)

    5′

    3′5′

    5′3′3′

    D Biases in library preparation and sequencing

    3′ ddC 5′3′ ddC

    T4 RNAligase

    Thermostable 5′ AppDNA or

    RNA ligase

    ssDNA/RNA adapter ligation

    B Treatment at cDNA 3′ end

    A Common library preparation protocols for detection of RNA modifications C RiboMethSeq detection of 2′-O-Me

    5′3′

    5′3′

    5′3′

    5′3′

    5′3′

    5′3′

    5′3′

    5′3′

    Excluded fragments

    Nm

    Nm

    5′- a

    nd 3′-e

    ndco

    vera

    ge

    Positions

    Protection at N+1

    Met

    hSco

    re

    Positions

    Score calculation

    5′- and 3′-end counting

    Sequencing

    Library preparation

    End polishing

    Random fragmentation

    R E V I E W S

    282 | MAY 2017 | VOLUME 18 www.nature.com/nrg

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved. ©

    2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

  • Receiver operating characteristics curve(ROC curve). A graphical plot illustrating the performance of a binary classifier system as a function of varied thresholds for true positives and false positives. Plotting the true-positive rate (TPR) against the false-positive rate (FPR) at various threshold settings leads to the ROC curve, in which the area under the curve (AUC) is a measure of the performance. The true-positive rate is also known as sensitivity, the FPR can be calculated as 1 − specificity.

    Lipophilicity‘Fat-loving’ character of a compound, reflected in its ability to dissolve in nonpolar solvents.

    methylation ratios vary between the approaches used, the overall results are very coherent for fully and partially modified rRNA sites.

    A most important caveat with all high- throughput approaches is related to various biases that affect library preparation, sequencing and data analysis. At the library preparation step, possible biochemical biases depending on the sequence content must be consid-ered, including nucleotide biases such as AU-rich versus GC-rich regions of RNA. A fairly well understood exam-ple for the preference of enzymes for certain sequences or nucleotides are ligases, the efficiency of which is par-ticularly affected by the nature of nucleotides at 5ʹ and 3ʹ ends of the RNA fragment93–95. Sequence context is also known to affect the efficiency of random priming of the RT reaction using N6-DNA oligonucleotides, whereas the presence of modified nucleotides at the 5ʹ and/or 3ʹ end of the RNA considerably affects ligation efficiency96 and also the polynucleotide kinase (PNK)-catalysed 5ʹ- phosphorylation step required to convert free 5ʹ-OH into 5ʹ-phosphate before the ligation step97 (FIG. 3D).

    Bioinformatics and data miningBioinformatics has an essential role in mapping and pre-diction of modified nucleotides from high- throughput sequencing data. Particular approaches have been developed (and more will have to be developed in the future), depending on the nature of an RT signature in the sequencing profile or a specific chemical treat-ment35,42. Method development and adaptation is also required for mining data from enrichment techniques, which may differ substantially between, for example, antibody-based and small-molecule-based enrichment.

    As a note of caution, every method and every algo-rithm relies on threshold parameters for modification calling (BOX 1), which must ultimately be defined by the user and may thus contain a measure of ambiguity. Noise in the sequencing data creates fluctuations in every such parameter and is more likely to produce, for example, false positives with increasing numbers of positions to

    be evaluated. Given the huge numbers of such sites in a transcriptome, all high-throughput sequencing-based methods are liable to produce false positives as well as false negatives in significant numbers that will depend on the settings of the threshold parameters.

    Typical parameters for evaluating the performance of a method include accuracy, selectivity, specificity, false-positive and false-negative rates, positive and nega-tive predictive values, as well as the false-discovery rate (FDR) and the area under the curve (AUC) in a receiver operating characteristics curve (ROC curve). Assessment of these parameters requires representative test data sets that include known, and ideally, independently validated sites of RNA modification. Therefore, method develop-ment for data mining from transcriptome-wide RNA modification profiles begins and ends with databases. Until recently, only highly specialized databases were available with data sorted by RNA type (for example, tRNA, rRNA, snRNA or snoRNA), which considerably limits possibilities for automatic data mining (see Web links in Further information). The availability of large transcriptome-wide data sets for m5C, m6A and pseudo-uridine stimulated the creation of more general databases such as the RNA Modification Base (RMBase), which now includes all of this information98. The field would certainly profit from an organized collection and possibly curation of new data sets according to common stand-ards. The overall performance of a method thus depends not only on chemical treatment and library preparation steps but also to a considerable degree also on the per-formance of bioinformatics pipelines and the robustness of the training data set used for parameter optimization.

    Biophysical validation of modification sitesGiven the uncertainties and pitfalls outlined above and in BOX 1, confidence in a data set of modification candi-dates can and should be markedly improved by valida-tion using an independent method. Beyond an indirect confirmation of a candidate site, for example, through the absence of a corresponding signal in knockout organ-isms (such as those lacking the probable enzyme cata-lysing the addition of that mark), biophysical methods that provide the most confidence include those based on chromatography and/or mass spectrometry. These methods, which are mentioned above as the tools of trade before the advent of high-throughput sequencing, now function as powerful validation tools. Biophysical properties that distinguish modified nucleosides from canonical ones include molecular mass changes as ana-lysed by mass spectrometry, and altered lipophilicity as exploited through chromatographic separation.

    Mass spectrometry. The apparently most straightforward approach to the high-throughput and sequence-specific analysis of transcriptomes comes from adapting method-ologies that have been extensively used in proteomics, in which complete cleavage after defined residues, typically effected by treatment with the protease trypsin, gener-ates a peptide fragment library, which is subsequently separated by chromatography. The eluting fragments are subjected to mass spectrometry analysis, which can

    Figure 3 | Library preparation issues. A | To detect reverse transcription (RT) arrest signals, mapping of cDNA 3ʹ ends is a crucial step to include in the library preparation protocol (top). Initiation of RT primer extension for long and short RNAs. Short RNAs require ligation of a 3ʹ adaptor for RT priming, whereas for long RNAs random priming with a DNA oligonucleotide containing six degenerate nucleotides (NNNNNN) is appropriate. Alternatively, template‑switch with thermostable group II intron reverse transcriptase (TGIRT) provides ‘ligation‑free’ 3ʹ RNA tagging (bottom). B | Treatment of the cDNA 3ʹ end. Exact definition of cDNA 3ʹ ends can be achieved by five alternative techniques, namely by ligation of a single‑stranded DNA (ssDNA) (part Ba), 5ʹ AppRNA oligonucleotide (part Bb), homonucleotide tailing with terminal transferase (NN‑tailing) (part Bc), the CircLigation protocol using 5ʹ-phosphorylated DNA primer and CircLigase (part Bd), and the use of templated cDNA extension with a terminal-tagging oligonucleotide (TTO) blocked at its 3ʹ end (part Be). In all cases, library preparation protocols end with PCR amplification and appropriate barcoding for multiplexed sequencing. C | Ribose methylation sequencing (RiboMethSeq) approach for mapping of 2ʹ-O‑Me residues in RNAs. The approach is based on a bias in fragmentation, as 2ʹ-O‑Me residues protect adjacent phosphodiester bonds from alkaline cleavage. The major steps of the RiboMethSeq protocol show how this bias is exploited for detection and quantification. D | Possible sources of ligation biases at 5ʹ and 3ʹ extremities of modified RNA. Ligation efficiency and degree of phosphorylation (if used) depend on the type of canonical or modified nucleotides at either end. dsDNA, double-stranded DNA.

    R E V I E W S

    NATURE REVIEWS | GENETICS VOLUME 18 | MAY 2017 | 283

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved. ©

    2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

    http://mirlab.sysu.edu.cn/rmbase/

  • DNAzymesArtificially generated DNA sequences that contain catalytic activity, for example, RNase activity.

    detect modifications from fragmentation patterns and by comparison with the calculated mass of the unmodi-fied fragment99. This concept has been applied to RNA (FIG. 4a), but suffers from two major drawbacks. First, although nucleases such as RNase T1 and MC1 (REF. 100) allow the preparation of fragment libraries, the protocols, tools and data banks that are commonplace in proteomics

    Nature Reviews | Genetics

    5′ NMPs

    Nucleosides

    MS analysis

    5′ 3′RNA5′

    3′RNA

    3′5′ G

    G

    G

    3′

    G

    Digestion byRNase T1

    ESI-TOFMS analysis

    Oligonucleotideseparation

    by HPLC

    Digestion to5′ mononucleotides

    Molecular mass

    Retention time

    a RNase T1 fragment analysis b LC–MS or LC–MS/MS analysis

    Fragmentationprofile

    Molecularmass

    Removal of5′ phosphate

    Reverse-phase HPLC

    MS/MS analysis

    Figure 4 | Mass spectrometry approaches for validation of RNA modification. a | RNA fragments are prepared by complete digestion with RNase T1, which generates a catalogue of fragments ending with a G residue at the 3ʹ end. These fragments are separated by high‑performance liquid chromatography (HPLC), their molecular mass determined by mass spectrometry (MS) and the presence or absence of modified residues deduced by tandem mass spectrometry (MS/MS) fragmentation or comparison with the theoretical molecular mass of unmodified fragments. b | Liquid chromatography coupled to classic mass spectrometry (LC–MS) or tandem mass spectrometry (LC–MS/MS). Isolated RNA or RNA fragments are degraded into nucleosides, which are then separated by reverse‑phase HPLC and further analysed by mass spectrometry. Modified nucleosides are represented as large blue circles, unmodified nucleosides as large grey circles and 5ʹ phosphates as small grey circles. Parameters providing information for the identification of a given modification include retention time, molecular mass and fragmentation patterns from MS/MS. ESI‑TOF, electrospray ionization, time‑of‑flight; NMP, nucleoside monophosphate.

    are mostly lacking in the RNA field101. Essentially, because the high attention paid to RNA modifications is still a recent phenomenon, only a few laboratories have developed relevant expertise102. Second, RNA, as a poly-anion, is much harder to transfer into the vacuum that is mandatory for mass spectrometry than is a neutral molecule. As the jargon goes, RNA ‘flies’ less effectively than proteins (and less effectively even than the slightly less polar DNA) and, consequently, the amount of RNA sample required for a comprehensive or even partial analysis of an mRNA population exceeds that achiev-able by current experimental protocols by several orders of magnitude. Hence, this type of analysis is currently restricted to abundant RNA species103. As an excellent example of the state of the art, the tRNA transcriptome of Escherichia coli has recently been sequenced with >80% coverage by mass spectrometry104.

    A widespread method of detecting and quantifying RNA modifications involves complete enzymatic degra-dation of an RNA sample to nucleosides (FIG. 4b), which ‘fly’ much more easily than oligonucleotides in mass spectrometry analytics, which enables detection and quantification in the femtomol to attomol range. The nucleoside mixture is typically resolved on a reverse-phase chromatography column, from which the eluting species are transferred to a mass spectrometer for ana-lysis. Biophysical parameters that allow the identifica-tion of a given nucleoside include its chromatographic retention time, its molecular mass and its fragmentation pattern. Although numerous laboratories have developed expertise for precise quantification102,105–110, the method does not provide any sequence information. The wider sequence context must therefore be introduced by other means, typically involving oligonucleotides such as biotinylated DNA for hybridization-based affinity purification of selected RNA sequences27,111,112. In an alternative approach, nuclease protection was achieved by duplex formation with complementary DNA oligo-nucleotides (50–75 nucleotides in length), such that unhybridized RNA was then degraded by mung bean or other nucleases113, and the remaining duplexes sub-jected to nucleoside analysis by liquid chromatography– mass spectrometry (LC–MS)90,114. Other methods that have been applied to excision of defined fragments from larger RNAs include application to site-specific cleavage of RNase H and short DNA oligonucleotides (or DNA/2ʹ-O-Me-RNA chimeric oligonucleotides) or DNAzymes115 (FIG. 5a).

    Chromatography. Retention behaviour, which is exploited in LC–MS approaches, reflects a biophys-ical property that also forms the basis for thin-layer chromatography (TLC) approaches for the identification and quantification of RNA modifications. Methods have been developed for nucleosides as well as nucleo-tides116,117, but detection of minute amounts relies on a radioactive 32P label. Of the different solid phases, microcrystalline cellulose is most widely used in a TLC system that uses pairwise combinations of up to three different solvent systems117. The retention values of around 70 modified nucleotides have been mapped in

    R E V I E W S

    284 | MAY 2017 | VOLUME 18 www.nature.com/nrg

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved. ©

    2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

  • Thin-layer chromatography(TLC). A technique for separation carried out in a thin layer of adsorbent material, also known as a stationary phase. TLC can be one-dimensional (1D-TLC) if it involves one type of separation or two-dimensional (2D-TLC) if it involves separation in one dimension using one solvent, followed by separation in a second dimension using a second solvent that separates on the basis of distinct physicochemical properties.

    Splinted ligationA technique to biochemically join nucleic acid fragments by enzymatic ligation. Both fragments are brought in spatial proximity by hybridization to a so-called splint DNA, which is an oligodeoxynucleotide that is essentially a cDNA of the target RNA molecule.

    three dimensions118. As outlined in the following section, increasingly sophisticated methods have been developed to introduce some measure of sequence information within the labelling protocol, which typically comprises phosphorylation with [γ-32P]ATP.

    Pre- and post-labelling techniques. Pre-labelling des-ignates techniques in which the 32P label is introduced into the RNA before the modification event (FIG. 5b). In vitro, this is typically achieved via transcription by the T7 RNA polymerase in the presence of [α-32P]NTPs117. After modification, a protocol named nearest neighbour analysis, including digestion with RNase T2 and subse-quent TLC analysis, identifies the nucleotide to the 3ʹ of a detected modification and thus reveals a limited meas-ure of sequence information on the modification site. Labelling in vivo, or in cell culture, is typically carried out by growth in 32P-containing medium and leads to uniform labelling18,64. Analysis of a nucleotide digest of total RNA, of subfractionated RNA or of an isolated RNA species allows an assessment of all modifications present, as well as their relative quantification18,64,111, but does not provide sequence information.

    Due to the above-mentioned lack of sequence infor-mation from pre-labelling techniques, the validation of modification sites in cellular RNA preparations typically relies on post-labelling approaches, which are designed to selectively label a given position in a sequence con-text. Current methods in principle derive from the Stanley–Vassilenko approach119, which combines random hydro lysis to generate free 5ʹ-OH groups, post-labelling by 5ʹ-phosphorylation using [γ-32P]ATP, subsequent size fractionation, release of the labelled 5ʹ-terminal nucleotide- monophosphate and finally TLC analysis (FIG. 5c). This results in a powerful method, which was particularly successful in its application to tRNA sequen-cing120. However, because the random hydrolysis step necessitates pure and precisely size-fractionated RNA, the method was further developed to include hydro lysis that is sequence specific rather than random. Several molecular tools have been adapted to sequence-specific RNA cleavage in this context, including RNase H121,122 and DNAzymes123,124. The most recent and sophisticated devel-opment is the SCARLET variant (FIG. 5c). In the SCARLET method, the RNA of interest is first cleaved at a desired site with RNase H, for which cleavage is directed by a chimeric oligo nucleotide comprising DNA and 2ʹ-O-Me-RNA. The free 5ʹ extremity is then 32P-labelled and connected to the 3ʹ of a long DNA oligonucleotide by splinted ligation. The RNA part is subsequently degraded by RNases and the 32P-labelled residue is purified along with the DNA oligo-nucleotide, to which it is attached. The labelled nucleo-tide is then released by nuclease P1-mediated hydrolysis before TLC analysis125–127. In contrast to other methods, SCARLET involves two, rather than one, sequence- dependent enrichment steps (FIG. 5c), which is probably the reason for its successful application in complex mix-tures such as total RNA or mRNA preparations. In several such applications it was used to validate high-throughput data on m6A (REF. 125) and Ψ (REF. 40) residues in mRNAs and long non-coding RNAs (lncRNAs).

    Oligonucleotide-based validation techniques. Methods for validation of modification candidates that input sequence information via oligonucleotides are not limited to the sequence hydrolysis approaches outlined above. One technique uses a ligase, which is somewhat sensitive to the nature of 3ʹ and 5ʹ nucleotides, in splinted ligation reactions using specifically modified splint DNA128,129. This method has so far been developed for validation of m6A, 2ʹ-O-Me and pseudouridine.

    The presence of modifications blocking conventional base-pairing — such as m1A, 1-methylguanosine (m1G) and N2,N2-dimethylguanosine (m22G) — lowers the stability of duplexes formed with cDNA oligonucleo-tides. On the basis of this principle, a DNA-chip-based technique was developed that detects modifications in a microarray130.

    With some limitations, primer extension by RT can also be used as a validation method of candidates from high-throughput mapping of RT arrest signals. This applies especially if the conditions of the RT-based validation differ substantially from the original library preparation protocol, for example, in terms of enzyme, buffer or Mg2+ concentrations, or if the generation of the RT signal is based on a principle different from the original screening method. A pertinent example is modification calling after alkaline hydrolysis accord-ing to the RiboMethSeq method90,91, validated by RT at low concentrations of dNTPs. Abortive cDNA that is preferably generated at sites of 2ʹ-O-methylations can either be directly analysed by polyacrylamide gel electro-phoresis (PAGE)131 or in semi-quantitative PCR or qPCR assays132,133.

    Designing RNA modification mapping studiesDepending on the nature of the modified nucleotide to be studied, the exact steps of a mapping project may vary, but several essential points are always conserved. As a general rule, thorough evaluation of the pros and cons (TABLE 2) should include the potential for apply-ing different methods to the detection of the same modification.

    Know the reagent. The selectivity of an envisaged chem-ical reagent or antibody is often incompletely charac-terized, especially in the context of high-throughput sequencing. Indeed, the large numbers of potential modification sites in a typical transcriptome can ren-der a chemical reagent prone to produce false positives in large numbers. For example, a spike-in control in bisulfite sequencing of RNA was reportedly converted with 99.5–99.8% efficiency43, the remaining 0.2–0.5% giving rise to up to 105 incomplete deamination signals in a hypothetical transcriptome of 2 × 107 cytidine nucleo-tides. In such instances, multiple coverage combined with thorough statistical analysis is required to deter-mine the probability of multiple deamination hits on the same site, which would erroneously be called a modi-fication. In this case, a minimum coverage of 10 reads and a bisulfite conversion of 80% were set as threshold values for modification calling. Note that quantitative numbers for selectivity are not available for the majority

    R E V I E W S

    NATURE REVIEWS | GENETICS VOLUME 18 | MAY 2017 | 285

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved. ©

    2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

  • of reagents described as ‘selective’ for a modification in the literature (TABLE 1), because selectivity was typically determined by biochemical methods without numer-ical evaluation51. Both reactivity and selectivity of a chemical reagent towards modified nucleo tides vary as a function of diverse reaction parameters including

    temperature, solvent and, in particular, pH134. An opti-mization of these parameters by comparing modified versus unmodified RNA in a suitable readout is strongly advisable. A further point to consider is that not all sites of a structured RNA are equally accessible to a reagent, a feature exploited in so-called structural probing

    Nature Reviews | Genetics

    5′3′

    DNAzymecleavageRNase H

    cleavage

    5′3′

    5′3′

    Hybridoligonucleotide

    Hybridoligonucleotide

    [32P]orthophosphate

    5′3′

    5′3′

    5′3′

    5′ NMP

    SCARLET

    3′5′

    5′

    Splinted ligation

    5′ 3′5′

    DNA oligonucleotide

    5′ NMP

    5′3′

    5′ NMP

    a Specific RNA cleavage(s) for fragment isolation

    5′3′

    5′3′

    DNAzymecleavage

    5′3′

    5′3′

    LC–MS or LC–MS/MS analysis

    Nuclease P1cleavage

    RNase Hcleavage

    Mung beannucleasecleavage

    Digestion to5′ mononucleotides

    RNA digestion

    Purification ofligated product

    TLC separationand imaging

    TLC separationand imaging

    Digestion to5′ mononucleotides

    RNA extractionand purification

    In vivo celllabelling

    5′ labelling by PNK

    c Specific RNA cleavage(s) and 5′ analysisb Pre-labelling

    2D-TLC 1D-TLC

    R E V I E W S

    286 | MAY 2017 | VOLUME 18 www.nature.com/nrg

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved. ©

    2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

  • experiments32,33. Note that carrying out a reaction in denaturing solvent such as dimethylsulfoxide (DMSO) is liable to affect selectivity134,135.

    Similar considerations apply to antibodies, some of which were apparently used in good faith — that is, without an experimental verification of the supplier’s description of selectivity44. Testing for specificity, for example, with synthetic modified versus unmodified RNA66,68,69 should be considered before engaging in MeRIP-type experiments.

    Know your statistics. As for all other high-throughput approaches, RNA modification mapping is prone to generate multiple false-positive hits. Among the numer-ous recent publications revolving around modifi cation calling, few were designed to raise awareness about the actual or presumed fraction of wrongly called sites (known as the false-positive rate or fallout) and the non-detected true sites (known as the false-negative rate or miss rate). A full set of statistical parameters, includ-ing specificity, selectivity and also a ROC curve, should become standard to facilitate an assessment of a given method’s performance. For an improvement of P values, we refer to BOX 1.

    Read mapping, although it is a field in itself, should observe some basic concepts. Changes in mapping results upon variation of such parameters as multiple mapping and mismatch allowance should be critically surveyed — that is, the robustness of modification call-ing results against variation of these mapping parameters should be verified27,39. Multiple copies of reads originat-ing from the same RNA molecule, arising from amplifi-cation of the same cDNA, can be avoided by UMI-based concepts85 implemented in adaptors and primers.

    Development of new methods typically includes test-ing against a training set of known modification sites. As the performance characteristics of a method will be more significant with higher instance numbers, the training set should be conceived as large as possible. Analysis of at least three biological replicates should become an obli-gate requirement. In addition, higher read coverage will improve the signal-to-noise ratio, making modification

    sites in highly abundant tRNA and rRNA ‘too easy’ to serve as a valid confirmation for modification calls on less-abundant RNA species.

    We posit that entire sets issued from modification site calling should be considered and continue to be addressed as ‘candidates’ until some sites from less- abundant RNA species — that is, presumably of low coverage — can be validated by an orthogonal method40.

    Validate. Techniques used for the validation of selected sites from a modification calling based on high- throughput sequencing should use a different detection principle. The more the validation approach differs in its nature from the initial method, the greater the confidence in validation will be. For example, selected sites of A-to-I editing, detected by the transcriptome-wide ICE-seq34 were validated in amplicon-based applications of the original ICE method136. Such targeted amplification of cDNA reads from candidate sites significantly improves coverage and thus the signal-to-noise level. However, an MS-based approach or a SCARLET assay127 will be more convincing for validation of a polymerase-based modifi-cation calling than a PCR approach, because the pitfalls and error sources differ more strongly (TABLE 2). The obvious downside of orthogonal validation approaches is the amount of labour required and the rather limited sensitivity of most existing validation techniques, which severely limits the number of sites that can reasonably be validated. One of the most stringent validation con-cepts would include the re-synthesis of an RNA mol-ecule containing a candidate site, in both modified and unmodified form, to be used in spike-in experiments.

    Furthermore, control experiments with organisms that are functionally depleted of relevant modifica-tion enzymes offer the potential for validation of large numbers of sites. In these experiments, genome-editing techniques are valuable because the true knockouts they generate are preferable compared with RNA interfer-ence (RNAi)-based knockdowns, which are frequently incomplete and may lead to unknown levels of residual modification activity.

    Currently, the validation of all modification candi-dates called in a transcriptome-wide analysis is, and remains, difficult with no perfect solution present-ing itself. In the absence of truly orthogonal detection methods, variations of the same method are a good first step. For example, a piece of pioneering work in anti-body-based mapping relied on the overlap of two dif-ferent, independently raised antibodies66. Obviously, the ideal solution would involve numerous maximally differ-ent detection principles applicable to the transcriptome scale, with the overlap counting as validated. Several impressive efforts in this direction have been made, particularly in the mapping of m5C (REFS 50,78), but the limited overlap documented in these studies further emphasizes the need for more detection principles.

    ConclusionsWith the field of RNA modification having received a tremendous boost from innovations in detection technol-ogy, scientists are on the lookout for methods with even

    Figure 5 | Methods for validation of modified RNA nucleotides. a | Specific RNA fragments can be isolated after protection from mung bean nuclease (or other nuclease) digestion in a DNA–RNA duplex, by directed RNase H cleavage with complementary chimeric oligonucleotides composed of DNA/2ʹ-O‑Me‑RNA residues, or by the use of specific DNAzymes. Modifications in the resulting fragments can be detected and quantified by liquid chromatography coupled to tandem mass spectrometry (LC–MS/MS). b | Pre‑labelling of RNAs and their analysis. RNA is metabolically labelled with 32P, followed by isolation of individual RNA species, hydrolysis to 5ʹ nucleoside monophosphates (5ʹ NMPs) and two‑dimensional thin‑layer chromatography (2D‑TLC) analysis. c | Techniques used for site‑specific RNA cleavage by DNAzymes or RNase H include 5ʹ end radioactive labelling of fragments with [γ-32P]ATP and polynucleotide kinase (PNK). Labelled fragments are size fractionated and analysed by TLC after digestion to 5ʹ NMPs or submitted to the site-specific cleavage and radioactive labelling followed by ligation- assisted extraction and thin‑layer chromatography (SCARLET) protocol. This includes an additional sequence selection step during a splint ligation, followed by size fractionation, digestion and TLC analysis. Modified nucleosides are represented as large blue circles, unmodified nucleosides as large grey circles, radiolabelled phosphates as small red circles and unlabelled phosphates as small grey circles.

    R E V I E W S

    NATURE REVIEWS | GENETICS VOLUME 18 | MAY 2017 | 287

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved. ©

    2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

  • better performance. Apart from better detection efficacy and higher throughput, a feature that is currently missing in most of the detection technologies is the simultan eous detection of different modifications in the same bio-logi cal sample or, preferably, in a single RNA molecule.

    Suitable technologies would have to processively screen entire single molecules and would circumvent amplifi-cation events that erase traces of modifications. As they might reveal several modifications in the same molecule, such data would, among other things, be able to inform

    Table 2 | Pros and cons of techniques for detection and validation of RNA modifications

    Method Application Pros Cons

    RT signature from RT arrest and misincorporation

    Detection of m1A

    • Straightforward protocol• Precise single-nucleotide

    mapping of modification sites• Adaptable to different types

    of modification

    • Negative control requires knockout strain of known modification enzymes

    • Semi‑quantitative

    Chemical reagent and RT stop coupled to high-throughput sequencing

    Detection of Ψ and I

    • Single‑nucleotide resolution• Precise mapping of

    modification sites

    • Reagents available only for a few modified residues

    • False positives from RNA structure and/or sequence

    • Detects only one type of modification

    Antibody-based enrichment coupled to high-throughput sequencing

    Detection of m6A, m5C and m1A

    Enrichment by antibody reduces required sequencing depth

    • No single-nucleotide resolution• Poorly defined specificity of various

    commercial antibodies

    Antibody-based enrichment and ultraviolet-based crosslinking coupled to high-throughput sequencing

    Detection of m6A

    • Single‑nucleotide resolution• Enrichment by antibody

    reduces required sequencing depth

    • Poorly defined specificity of various commercial antibodies

    • Low crosslink efficiency

    RiboMethSeq Detection and quantification of 2ʹ-O‑Me

    • Single‑nucleotide resolution• Relative quantification of

    methylation rate• Nanogram amount of input

    RNA

    • False positives because of RNA structure

    • Requires high sequencing depth for analysis

    • Currently restricted to abundant RNA species

    Bisulfite RNA sequencing Detection and quantification of m5C

    • Single‑nucleotide resolution• Quantification is possible

    • Noise from incomplete deamination in structured RNA regions

    • Requires substantial amount of input material

    • Requires high sequencing depth for analysis

    • Mapping of reads is difficult because of degeneracy

    Mass spectrometry of RNase T1 digests

    Detection and validation of various modifications

    • Includes sequence information

    • Single‑nucleotide resolution

    • Requires substantial amount of input material

    • Requires highly specialized equipment

    • Requires highly developed expertise for data mining

    LC–MS and LC–MS/MS of nucleosides

    Quantification and validation of various modifications

    • Highly accurate quantification

    • Expertise is reasonably widespread

    • No sequence information• Requires specialized equipment

    SCARLET and related methods

    Validation and quantification of m6A and Ψ

    • Includes sequence information

    • Quantification possible• No specialized equipment

    • Single‑site query; no high throughput• Labour intensive

    RT stop at low dNTP concentrations

    Validation of 2ʹ-O‑Me

    • Includes sequence information

    • Experimentally simple, for example, PCR

    • Single‑site query; moderate throughput only

    • Indirect validation without physicochemical properties

    Hybridization efficiency to DNA microarrays

    Validation of various modifications

    • Includes sequence information

    • Multiple‑site queries, at least medium throughput

    • Indirect validation• Requires some technical expertise

    and equipment

    Ψ, pseudouridine; dNTP, deoxyribonucleoside triphosphate; I, inosine; LC–MS, liquid chromatography coupled to classic mass spectrometry; LC–MS/MS, liquid chromatography coupled to tandem mass spectrometry; m1A, 1‑methyladenosine; m5C, 5‑methylcytidine; m6A, N6‑methyladenosine; RiboMethSeq; ribose methylation sequencing; RT, reverse transcription; SCARLET, site‑specific cleavage and radioactive labelling followed by ligation‑assisted extraction and thin‑layer chromatography.

    R E V I E W S

    288 | MAY 2017 | VOLUME 18 www.nature.com/nrg

    © 2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved. ©

    2017

    Macmillan

    Publishers

    Limited,

    part

    of

    Springer

    Nature.

    All

    rights

    reserved.

  • on the combinatorial status of those different modifica-tions in a given molecule and thus yield highly interesting insights into the sequence and interdependence of mul-tiple modification events137, as well as interplay between different modification enzymes.

    Again, innovative sequencing technologies hold the most promise. Using the technology employed in single-molecule real-time (SMRT) sequencing, the molecular events during RT that eventually lead to the RT signature can be analysed by single molecule spectroscopy in nanoscale wells and can provide direct evidence of the efforts of the polymerase to bypass a modi-fication such as m6A (REFS 2,138). However, the presented data suggest that, although an influence of the modifica-tion is clearly visible, the inverse route, that is, methods to derive the presence of a particular modification from

    a specific signal pattern, may take some time in coming. Taking a cue from the DNA modifi cation field, in which nanopore technology can detect multiple and different modifications in a single chain139, a similar development in RNA sequencing seems promising, although currently is still in the development stage.

    As opposed to the surprising and sudden influx of high-throughput sequencing into the field in 2012 (REFS 65,66), today’s community is watching every devel-opment with anticipation. Not only do known modifi-cations appear in new places and RNA species, but the number of known chemically distinct modifications is also still growing140. It is therefore easy to state that modification mapping in the epitranscriptome has only begun and that the field will be busy and exciting for a long time.

    1. Frye, M., Jaffrey, S. R., Pan, T., Rechavi, G. & Suzuki, T. RNA modifications: what have we learned and where are we headed? Nat. Rev. Genet. 17, 365–372 (2016).

    2. Saletore, Y. et al. The birth of the epitranscriptome: deciphering the function of RNA modifications. Genome Biol. 13, 175 (2012).

    3. Schwartz, S. Cracking the epitranscriptome. RNA 22, 169–174 (2016).

    4. Witkin, K. L. et al. RNA editing, epitranscriptomics, and processing in cancer progression. Cancer Biol. Ther. 16, 21–27 (2015).

    5. Roundtree, I. A. & He, C. RNA epigenetics — chemical messages for posttranscriptional gene regulation. Curr. Opin. Chem. Biol. 30, 46–51 (2016).

    6. Liu, N. & Pan, T. RNA epigenetics. Transl Res. 165, 28–35 (2015).

    7. Spenkuch, F., Motorin, Y. & Helm, M. Pseudouridine: still mysterious, but never a fake (uridine)! RNA Biol. 11, 1540–1554 (2014).

    8. Machnicka, M. A. et al. MODOMICS: a database of RNA modification pathways — 2013 update. Nucleic Acids Res. 41, D262–D267 (2013).

    9. Helm, M. & Alfonzo, J. D. Posttranscriptional RNA modifications: playing metabolic games in a cell’s chemical Legoland. Chem. Biol. 21, 174–185 (2014).

    10. Holley, R. W. et al. Structure of a ribonucleic acid. Science 147, 1462–1465 (1965).

    11. Ban, E. & Song, E. J. Recent developments and applications of capillary electrophoresis with laser-induced fluorescence detection in biological samples. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 929, 180–186 (2013).

    12. Karijolich, J. & Yu, Y. T. Spliceosomal snRNA modifications and their function. RNA Biol. 7, 192–204 (2010).

    13. Zinshteyn, B. & Nishikura, K. Adenosine-to-inosine RNA editing. Wiley Interdiscip. Rev. Syst. Biol. Med. 1, 202–209 (2009).

    14. Yu, B. et al. Methylation as a crucial step in plant microRNA biogenesis. Science 307, 932–935 (2005).

    15. Motorin, Y. & Helm, M. RNA nucleotide methylation. Wiley Interdiscip. Rev. RNA 2, 611–631 (2011).

    16. Moss, B., Gershowitz, A., Stringer, J. R., Holland, L. E. & Wagner, E. K. 5ʹ-terminal and internal methylated nucleosides in herpes simplex virus type 1 mRNA. J. Virol. 23, 234–239 (1977).

    17. Nichols, J. L. & Welder, L. The modified nucleotide constituents of human prostatic cancer cell (MA-160) poly(A)-containing RNA. Biochim. Biophys. Acta 608, 1–18 (1980).

    18. Rottman, F. M., Desrosiers, R. C. & Friderici, K. Nucleotide methylation patterns in eukaryotic mRNA. Prog. Nucleic Acid. Res. Mol. Biol. 19, 21–38 (1976).

    19. Limbach, P. A. & Paulines, M. J. Going global: the new era of mapping modifications in RNA. Wiley Interdiscip. Rev. RNA 8, e1367 (2016).

    20. Song, C. X., Yi, C. & He, C. Mapping recently identified nucleotide variants in the genome and transcriptome. Nat. Biotechnol. 30, 1107–1116 (2012).

    21. Jia, G. et al. N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO. Nat. Chem. Biol. 7, 885–887 (2011).This is the first demonstration of dynamics in mRNA modification by enzymatic demethylation.

    22. Ebhardt, H. A. et al. Meta-analysis of small RNA-sequencing errors reveals ubiquitous post-transcriptional RNA modifications. Nucleic Acids Res. 37, 2461–2470 (2009).

    23. Findeiss, S., Langenberger, D., Stadler, P. F. & Hoffmann, S. Traces of post-transcriptional RNA modifications in deep sequencing data. Biol. Chem. 392, 305–313 (2011).

    24. Ryvkin, P. et al. HAMR: high-throughput annotation of modified ribonucleotides. RNA 19, 1684–1692 (2013).

    25. Alon, S. et al. Systematic identification of edited microRNAs in the human brain. Genome Res. 22, 1533–1540 (2012).

    26. Dominissini, D., Moshitch-Moshkovitz, S., Amariglio, N. & Rechavi, G. Adenosine-to-inosine RNA editing meets cancer. Carcinogenesis 32, 1569–1577 (2011).

    27. Hauenschild, R. et al. The reverse transcription signature of N-1-methyladenosine in RNA-Seq is sequence dependent. Nucleic Acids Res. 43, 9950–9964 (2015).This is a combination of the cDNA signature concept with machine learning to predict modification sites.

    28. Tserovski, L. et al. High-throughput sequencing for 1-methyladenosine (m1A) mapping in RNA. Methods 107, 110–121 (2016).

    29. Gustafsson, C. & Persson, B. C. Identification of the rrmA gene encoding the 23S rRNA m1G745 methyltransferase in Escherichia coli and characterization of an m1G745-deficient mutant. J. Bacteriol. 180, 359–365 (1998).

    30. Li, X. et al. Transcriptome-wide mapping reveals reversible and dynamic N1-methyladenosine methylome. Nat. Chem. Biol. 12, 311–316 (2016).

    31. Dominissini, D. et al. The dynamic N1-methyladenosine methylome in eukaryotic messenger RNA. Nature 530, 441–446 (2016).

    32. Talkish, J., May, G., Lin, Y., Woolford, J. L. Jr & McManus, C. J. Mod-seq: high-throughput sequencing for chemical probing of RNA structure. RNA 20, 713–720 (2014).

    33. Weeks, K. M. Advances in RNA structure analysis by chemical probing. Curr. Opin. Struct. Biol. 20, 295–304 (2010).

    34. Sakurai, M., Yano, T., Kawabata, H., Ueda, H. & Suzuki, T. Inosine cyanoethylation identifies A-to-I RNA editing sites in the human transcriptome. Nat. Chem. Biol. 6, 733–740 (2010).This is pioneering work on the use of selective chemicals for transcriptome-wide detection of modifications.

    35. Suzuki, T., Ueda, H., Okada, S. & Sakurai, M. Transcriptome-wide identification of adenosine-to-inosine editing using the ICE-seq method. Nat. Protoc. 10, 715–732 (2015).

    36. Bakin, A. & Ofengand, J. Four newly located pseudouridylate residues in Escherichia coli 23S ribosomal RNA are all at the peptidyltransferase center: analysis by the application of a new sequencing technique. Biochemistry 32, 9754–9762 (1993).

    37. Schwartz, S. et