genome biology: research and clinical perspectives€¦ · 2 genome biology: research and clinical...

30

Upload: others

Post on 18-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and
Page 2: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 1

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Table of ConTenTs

Executive Summary 2

I “LAST-GEN” AND “THIS-GEN” SEQUENCING TECHNOLOGIES 4

Sanger sequencing 4

454 Life Sciences & the Church lab: The first “next-gen” approaches 5

Illumina 6

Life Technologies 7

Complete Genomics 8

II NEXT-NEXT-GEN SEQUENCING TECHNOLOGIES 9

SMRT sequencing 9

Helicos 9

Nanopore sequencing 10

Electron microscopy 11

III CLINICAL APPLICATIONS OF WHOLE-GENOME AND EXOME SEQUENCING 12

IV RESEARCH APPLICATIONS OF NEXT-GEN SEQUENCING 14

Genome sequencing 14

Transcriptomics 15

Epigenomics 15

Paleogenomics 16

Metagenomics 16

Single-cell genomics 16

V COMPUTATIONAL CHALLENGES 17

VI SYNTHETIC BIOLOGY & GENOME EDITING 19

VII CONCLUSIONS 21

VIII REFERENCES 22

abouT This RepoRT

This special report is for exclusive use by members of the American Chemical Society. It is not

intended for sale or distribution by any persons or organizations. Nor is it intended to endorse

any product, process, or course of action. This report is for information purposes only.

© 2013 American Chemical Society

abouT The auThoRJeffrey Perkel has been a scientific writer and editor since 2000. He holds a PhD in Cell and

Molecular Biology from the University of Pennsylvania, and did postdoctoral work at the

University of Pennsylvania and at Harvard Medical School.

Page 3: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

Executive Summary

The Human Genome Project (HGP) took 15 years and some $3 billion to decode the complete

human blueprint of a single individual. Since 2003, however, when the project was deemed

complete, the genomics world has undergone a massive transformation. The capillary

electrophoresis-based instrumentation that drove the HGP has been replaced by so-called

“next generation” sequencing technologies that can sequence in days, and for mere thousands

of dollars, what once took years to achieve.

As a result, the life sciences are undergoing a fundamental transformation, both in the research

laboratory and in the clinic. In the lab, researchers can now use cutting-edge sequencing

technologies to pick apart the genomes of individual mammalian cells and unsifted pools

of bacteria. They can trace the biology of cancer and the epigenetic history of stem cells. In

the clinic, they can identify genetic abnormalities in developmentally delayed children using

only the blood of a patient and his or her unaffected parents, and they can identify actionable

mutations in cancerous tissues.

That’s not wishful thinking. Genomics in the clinic is a current reality [1], both at academic

institutions like the Medical College of Wisconsin, the University of California, Los Angeles, and

Children’s Mercy Hospital in Kansas City, Mo., as well as at companies like EdgeBio, Foundation

Medicine, and Ambry Genetics. According to a recent article in Chemical & Engineering News,

of 176 diagnostics labs surveyed by the College of American Pathologists, 19% currently offer

next-gen sequencing tests, and 55% “planned to use the tests within three years.” [1]

Statistics compiled by the U.S. National Human Genome Research Institute (NHGRI) show

the cost of sequencing a human genome as of April 2013 was $5,826, down from a million

dollars-plus just five years earlier. [2] Service providers can do it for even less, especially if users

are willing to accept a kind of pared-down dataset, called an exome, which contains only the

content-rich protein-coding regions. The Yale Center for Genome Analysis lists per-exome costs

as low as $557 for six pooled samples. [3] At those prices, researchers can afford to expand

their studies; instead of sequencing a handful of subjects, they now can sequence hundreds or

thousands, and sequence them to greater depth and resolution than ever.

Exomes are already making waves in the clinic, but so rapidly is the field moving overall –

accelerating far faster than Moore’s Law, according to NHGRI figures [2] – that what seemed

fantastical and cutting-edge just a few years ago can seem passé today. Witness the abrupt

cancellation in late August of the Genomics XPRIZE, which would have awarded $10 million

to “the first team that could rapidly and accurately sequence 100 whole human genomes to a

standard never before achieved at a cost of $10,000 or less per genome.” [4]

Page 4: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 3Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

The mission of the XPRIZE Foundation is “to bring about radical breakthroughs for the

benefits of humanity, thereby inspiring the formation of new industries and the revitalization

of markets,” [5] and when the competition was announced in 2006, the competition’s goal

seemed to fit that bill. It was “far beyond what anyone has done at any price,” according to Chad

Nusbaum of the Broad Institute of MIT and Harvard. [6]

Really far beyond: An article in the Los Angeles Times about the cancellation of the Genomics

XPRIZE notes that in 2006, “it cost more than $10 million and took 18 months to sequence one

complete genome. Today … companies can get the job done in a fraction of the time, and

for prices around $4,000–$5,000.” [7] The blog post by Peter Diamandis, Chairman and CEO of

the XPRIZE Foundation, announcing the cancellation, says it all – it was titled, “Outpaced by

Innovation: Canceling an XPRIZE.” [4]

Indeed, researchers today face a situation that seems almost Aesopian. The cost of genome

sequencing has become so inexpensive so quickly that it is almost easier to sequence an entire

genome than to isolate and sequence specific genes. As a result, some researchers and service

providers opt simply to sequence everything and computationally filter the data after the fact.

Of course, as datasets swell, so do the computational specifications required to hold,

manipulate, and process them – not to mention the intellectual capital needed to double-

check the computers’ work and make sense of the data. By most accounts, sequencing has

ceased to be a bottleneck in the “‘omics” disciplines, replaced by data analysis.

On the other hand, the rapidly falling price of sequencing, aided by the growth of so-called

“personal” sequencing systems aimed at individual labs and smaller core facilities, means

genomic technology is now finding its way into the hands of more and more researchers. Those

researchers are finding ever more creative ways to use that technology, including studies of

gene expression, epigenetics, extinct organisms, and single-cell genomics – not to mention, of

course, lots and lots of de novo genome sequencing.

The current crop of sequencers comes mostly from three firms: Illumina, Life Technologies,

and Roche/454 Life Sciences, all of which rely on some form of addition-stop-read-repeat

strategy. Pacific Biosciences has a competing instrument that reads the sequence of individual

molecules as they are synthesized in real time. Other technologies are also in development, the

most promising of which is nanopore sequencing.

In concert with this boom in sequencing technology, researchers have also made strides

in genome engineering. In 2010, sequencing impresario J. Craig Venter demonstrated that

it was possible to write a genome from scratch and insert it into an empty bacterial shell.

Most researchers aren’t prepared to undertake that kind of effort, but the recent discovery of

custom nucleases called TALENs and a newly discovered system called CRISPR/Cas is providing

researchers with unparalleled power to customize genomes at will.

Page 5: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing4 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

With so much R&D occurring on so many fronts, excitement is in abundance in the genomics

arena – and so, relatively speaking, is funding. In short, whether in the research lab or the clinic,

it’s a good time to be a genome biologist.

I. “LAST-GEN” AND “THIS-GEN” SEQUENCING TECHNOLOGIES

From the 1970s through 2005, DNA sequencing was synonymous with one name, Frederick

Sanger. Sanger developed one of two sequencing technologies published in 1977; Allan

Maxam and Walter Gilbert developed the other. Maxam and Gilbert’s method relied on

complex chemical modification and digestion, whereas Sanger’s was enzymatic. The latter

eventually became the molecular community’s method of choice.

sangeR sequenCingSanger sequencing is just a modified in vitro DNA polymerization reaction. The DNA to

be sequenced (the template) is annealed to a short DNA segment (a primer) that specifies

the origin from which the sequence will be read. Addition of all four deoxyribonucleotide

triphosphates (dNTPs) and DNA polymerase kicks off the reaction. The reaction is then split

into four tubes, each of which contains one of four 2’,3’-dideoxynucleotide triphosphosphates.

Lacking the 3’-hydroxyl group to which the next nucleotide would normally link, these

“dideoxy” nucleotides act as chain terminators, as their incorporation into the nascent DNA

strand blocks further polymerization.

In the original implementation, researchers also spiked the reaction with a radioactive tracer.

To read the sequence, the four tubes that make up the final reaction were resolved on a

denaturing polyacrylamide gel and exposed to film. By reading from the bottom of the gel

towards the top, a researcher could trace the sequence based on fragment size.

This process was effective, but slow. It was also radioactive, which added workflow and waste-

handling complexity. In 1986, however, the process was simplified and automated.

Leroy Hood at the California Institute of Technology and colleagues replaced the radioactive

tracer with four spectrally distinct fluorescent dyes, which condensed four lanes of a traditional

sequencing gel into one. [8] They also replaced the cumbersome polyacrylamide gel with thin

capillary tubes, which could be multiplexed to 16, 48, or even 96 reactions per run. This basic

strategy, which dramatically simplified and accelerated DNA sequencing and enabled the

decoding of the fruit fly, nematode worm, mouse, and human genomes, is still in use today.

(Myriad Genetics’ BRACAnalysis® test for familial breast cancer risk, for instance, is based on

Sanger. [9]) Indeed, Sanger remains the gold standard for DNA sequencing in terms of both

quality and quantity, producing nearly 1,000 usable bases (1 kilobase, or kb) per capillary per

run, and a single 96-capillary Applied Biosystems 3730xl DNA Analyzer can produce up to 2.1

million bases per day. [10]

Page 6: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 5Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

Yet Sanger sequencing has several shortcomings. The fragments to be sequenced must

generally be cloned, and those clone libraries must be maintained – not a small logistical

problem when the libraries contain millions of bacterial clones. Also, Sanger sequencers are

limited in terms of sample throughput, at least when it comes to genome-scale projects. To

sequence a human genome to completion – say, with an average sequence coverage of 30×–

requires some 90 billion bases of sequence, or around 90 million 1-kb reactions.

454 life sCienCes & The ChuRCh lab: The fiRsT “nexT-gen” appRoaChesAs the Human Genome Project proved, it is possible to run that many sequencing reactions –

but it’s neither practical nor scalable. In 2005, two groups independently published sequencing

strategies that took a radically different, and highly parallelized, approach.

One group, at 454 Life Sciences (now owned by Roche Diagnostics), described the sequencing

of the 580-kb bacterium, Mycoplasma genitalium, in a single four-hour run. [11] The second

team, led by Harvard University geneticist George Church, sequenced E. coli over 60 hours. [12]

Unlike Sanger’s chain termination method, both of these papers – and indeed, most of the

popular sequencing chemistries used today – employed a massively parallel strategy that can

be summarized as “add-stop-read-repeat.” In effect, instead of having a library of clones that are

picked and sequenced, these strategies create a sequencing library on the fly, lock it randomly

in a positional planar array, and query each reaction over and over again to read the sequence

at each position. [13] The result is a collection of relatively short sequence reads that can then

be assembled computationally.

In the 454 Life Sciences method, a library of genomic DNA is fragmented and ligated to

adaptors, which serve as molecular handles and primer-binding sites. Using those adaptors, the

library is coupled to and amplified on the surface of 28-μm beads, such that each bead contains

many copies of a single fragment. In effect, each bead is the equivalent of a single clone in

a traditional clone library. But unlike in the traditional Sanger approach, the entire library is

then distributed into the microscopic wells of a so-called PicoTiterPlate for sequencing. To

read the sequence, the wells are filled with DNA polymerase, as well as the reagents required

to convert any released pyrophosphate (the product of nucleotide incorporation) into ATP

and thence into light by the action of a luciferase enzyme, a process called “pyrosequencing.”

Nucleotides flow over the chip one at a time – e.g., first dATP, then dCTP, dGTP, and finally

dTTP. If the polymerase can incorporate that single added base, it releases pyrophosphate and

initiates the pyrosequencing process, the light output of which is proportional to the amount

of pyrophosphate released and thus the number of nucleotides incorporated. Because no other

dNTPs are present, the reaction stops after each addition step. (Though, in fact, more than one

base may be added at a time, if the polymerase encounters a homopolymer run, for example

CCCCC.) An adjacent camera captures the light flashes, and, by analyzing the resulting images,

the system can work out the nucleotide sequence of each fragment. In total, the team in the

454 Life Sciences study collected 33.6 million bases of M. genitalium sequence from 306,178

“high-quality reads,” each averaging 110 bases in length.

Page 7: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing6 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

Church’s approach used DNA ligase activity for sequencing. In the same way that

454 Life Sciences did, the Church team built their library on-the-fly using emulsion PCR

on beads. They then arrayed that collection of beads on a microscope slide under an

epifluorescent microscope and sequenced it by annealing one primer (complementary to

added adaptor sequences) and adding a pool of degenerate nonamer oligonucleotides in

which a single position was fixed and coded to the attached fluorophore. Only the correct

nonamer can anneal and be ligated by DNA ligase, and its fluorescent signature indicates the

nucleotide identity of whichever base was being interrogated at the time. To read the next

base, the DNA is denatured, then another primer is annealed and ligated to a different set of

nonamers. In total, Church’s team was able to read 26 bases per fragment, collecting some

30.1 million bases in total from 1.16 million mappable reads.

Those two papers launched “next-generation” DNA sequencing – it’s really “this-generation” at

this point -- and the rise of that technology has been nothing less than meteoric. The current

454/Roche top-of-the-line sequencer, the GS FLX+, offers near-Sanger sequencing lengths

of up to 1 kb, and 700 bases on average over 1 million reads, for 700 megabases (Mb) total in

23 hours. [14] 454’s smaller GS Junior system can produce about 100,000 400-base reads for

35 million “high-quality, filtered bases” per run. [15] With a list price of about $108,000 in the

U.S., compared to $425,000 for the FLX+, [16] the GS Junior is targeted at smaller sequencing

centers and even individual labs.

Impressive as 454’s throughput specs are, they pale in comparison to competing sequencing

technologies, and the company mostly positions its platform at specific applications that can

leverage its long reads but relatively modest data output – applications like de novo microbial

genome sequencing, targeted resequencing, metagenomics, and more. Researchers often

combine 454 technology with other shorter-read technologies to combine the power of both

strategies. Church commercialized his sequencing strategy and instrumentation first through

Agencourt, and ultimately through Dover Systems, part of Danaher Motion (the Polonator

G.007).

(On Oct. 15, GenomeWeb reported that Roche “is shuttering its 454 Life Sciences sequencing

operations and laying off about 100 employees…. The 454 sequencers will be phased out in

mid-2016, and the 454 facility in Branford, Conn., will be closed ‘accordingly’,” according to a

statement from Roche. [17])

illumina The undisputed leader of next-gen sequencing, with some 1,700 publications to its credit, is

Illumina. The company’s sequencing-by-synthesis strategy, acquired from Solexa (and thus

sometimes called “Solexa sequencing”), has several elements in common with the Church

and 454 strategies, including being massively parallel, the planar array design, and the idea

that bases are added and read one at a time. But Illumina’s method doesn’t build its library on

beads. Rather, DNA fragments are amplified on the surface of a flow cell by a process called

Page 8: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 7Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

bridge amplification, which creates billions of discrete islands of identical molecules, each of

which is read simultaneously.

The sequencing process itself relies on reversible terminators. Each base is coupled to a

different fluorescent dye and is chemically modified on the 3’-OH to prevent chain extension.

As a result, all four bases can be added simultaneously, though only one will be added per step.

That base is read by fluorescence imaging of the cell, after which the termination is reversed

and the fluorophore removed; then the cycle repeats. [13]

Initially, the Solexa strategy produced reads far shorter than Sanger, just 25–35 bases in length.

[13] At that length, de novo assembly of a complex vertebrate genome is challenging, so the

technique was used mostly for resequencing applications – that is, comparing reads to a known

reference standard – as well as “counting” applications like expression analysis.

Today, Illumina’s top-of-the-line sequencer, the HiSeq 2500, can produce some 600 billion bases

of sequence per run – three orders of magnitude more than 454 – which can take anywhere

from 7 hours to 11 days, depending on flow cell architecture and the read length desired.

[18] Read lengths range from 36 to 300 bases (the latter in a 2×150 base-pair (bp) “paired-

end” mode), and researchers can now use those data for de novo assembly of even complex

genomes such as the human and mouse. [19]

Like 454, Illumina has also developed a smaller, less expensive version of its sequencer. The

Illumina MiSeq (~$125,000) can produce 2×300 bp reads in 65 hours, for up to 15 gigabases

(Gb) per run, [20] and 2×400-base paired-end reads are in development. [21]

Recently Illumina announced even longer read lengths, and the ability to obtain haplotype

“phasing” data, thanks to a technology called Moleculo. [22] Normally, sequencing libraries are

built such that it isn’t possible to unambiguously assign a particular read to one chromosomal

copy (for example, the paternal chromosome 2) as opposed to its counterpart. This can

complicate genetic analyses, however, if an individual has two mutations in a single gene:

Does a single copy have two mutations, or does each copy have one? Moleculo is a sequencing

library preparation strategy that isolates large pieces of DNA, barcodes them separately, and

sequences them. Using that strategy researchers can assemble contiguous phased fragments

~8–10 kb in size. [23] Moleculo technology is not available in kit form. However, in July the

company announced a Phasing Analysis Service based on Moleculo, with turnaround times of

12 weeks or less. [24]

life TeChnologies Life Technologies (formerly Applied Biosystems) offers two different platforms for next-gen

sequencing. The first is SOLiD™, a ligase-based instrument capable of some 320 gigabases

per run in a 50×50 paired-end mode. [25] (Though both are based on sequencing-by-ligation,

SOLiD sequencing is different from Church’s method. SOLiD uses a more complicated

Page 9: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing8 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

dye-encoding strategy to read two bases per ligation reaction, followed by signal decoding to

convert the fluorescent colors into a nucleotide sequence. [13])

The second platform comes from Ion Torrent™, founded by Jonathan Rothberg (who also

founded 454 Life Sciences). Life Tech acquired Ion Torrent in 2010, the same year the company

launched its Personal Genome Machine (PGM™). With its exceptionally low price-tag (~$50,000

for the instrument, plus a few hundred dollars per run for consumables), the PGM promised to

democratize next-gen sequencing, to put the technology in the hands of individual research

labs, core facilities, and hospitals that might never be able to afford top-of-the-line sequencing

hardware. [26] One reason for the low cost of the PGM (and its new cousin, the Ion Proton™)

is that the system has no optics, and is based instead on semiconductor technology. The

company describes it as “essentially the world’s smallest solid-state pH meter.” [27] It records the

proton release that occurs as bases are incorporated into a growing DNA strand one by one.

Ion Torrent has released several versions of its consumable since the product’s launch. The

current Ion 314™ Chip produces up to 550,000 400-base reads in 3.7 hours, for an estimated

output of 100 Mb. [28] The latest generation Ion 318™ Chip can produce up to 2 Gb worth of

reads in 7.3 hours. [28] More recently, the company released the Ion Proton system, capable of

up to 10 Gb-worth of 200-base reads per run (each lasting 2–4 hours). [29] According to Life

Technologies’ documentation, a forthcoming, second-generation Ion PII™ Chip, “will enable

sample-to-variant analysis of a human genome in a single day, at up to 20× coverage.” [29]

CompleTe genomiCsComplete Genomics offers a whole-genome sequencing service, as opposed to an instrument.

The company has developed a ligation-based sequencing strategy akin to the Polonator and

SOLiD method, but with some significant differences. In particular, the Complete Genomics’

sequencing library preparation method involves “DNA Nanoball™ Arrays.” [30] A nanoball is

a long repeating piece of DNA containing hundreds of copies of a genomic DNA fragment

flanked by adaptors. Once created, that fragment collapses into a spherical structure that can

be tightly positioned on a planar array. In addition, the company’s “combinatorial probe anchor

ligation” (cPAL) sequencing strategy enables it to obtain up to 70 bases of sequence per read

(compared to the Church strategy’s 26 bases). [31]

In March 2013, BGI-Shenzhen announced it had acquired Complete Genomics for nearly $118

million. [32] As reported by Bloomberg Businessweek before the deal closed, “The possibility

of BGI and Complete uniting has not been lost on Illumina, the world’s biggest seller of

sequencing machines, which counts BGI as a top customer.” [33] After offering a competing

bid that Complete rejected, Illumina challenged the merger on national security grounds but

ultimately was denied. [34]

Page 10: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 9Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

II. NEXT-NEXT-GEN SEQUENCING TECHNOLOGIES

Despite its successes, the sequencing industry isn’t resting on its laurels. New sequencing

strategies – so-called “third-generation” or “next-next-generation” technologies – are in

development, and one is already commercially available. What all have in common is the ability

to sequence individual molecules of DNA, as opposed to amplified collections of identical

molecules.

smRT sequenCing Pacific Biosciences began offering its PacBio® RS Single Molecule, Real-Time (SMRT®) DNA

Sequencing System commercially in 2011, and released its updated RS II in April 2013. Key

to the company’s strategy is the SMRT Cell, a planar substrate containing some 150,000 “zero

mode waveguides” (ZMWs) – basically tiny wells illuminated from the bottom. At the bottom

of each well, a DNA polymerase molecule is immobilized and bound to a template-primer

complex. [35] To run the reaction, each well is filled with a collection of fluorescent dNTPs, each

labeled a different color. As the polymerase grabs and holds individual nucleotides for insertion

in the growing strand, the system detects a burst of fluorescence, which falls off once the base

is inserted and the now-released dye diffuses out of the ZMW illumination volume.

The company claims the longest read lengths of any current sequencing technology – 3,000–

5,000 bases on average, with some exceeding 20,000 bases. [36] Such read lengths simplify

eukaryotic de novo genome sequencing projects by enabling users to span the repetitive

elements that litter larger genomes and complicate their assembly. It also provides built-in

haplotype phasing information, as, by definition, each well sequences a single molecule at a

time. Unlike other sequencing technologies, the RS II also enables users to detect epigenetic

modifications, such as DNA methylation events, without in vitro chemical steps (such as

sodium bisulfite-based sequencing), using the kinetics of nucleotide addition of modified vs.

unmodified bases. [37]

In total, Pacific Biosciences lists some 70 papers on its web site related to its technology,

including a handful of epidemiological analyses. Notably, SMRT sequencing was used to

decode the bacteria underlying a 2010 Haitian cholera epidemic and a German E. coli outbreak

in 2011. [38, 39]

heliCos Helicos BioSciences also had a commercialized third-gen sequencing technology until it filed

for Chapter 11 bankruptcy protection in late 2012. [40]

The company’s Heliscope technology relied on fluorescently labeled reversible terminators,

just like Illumina. But the Heliscope added the bases one at a time, imaging after each addition

reaction. In this way, the company generated huge numbers of short reads, each about 23–27

nucleotides on average in its first publication. [41]

Page 11: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing10 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

In 2009, Helicos demonstrated that it had sequenced the genome of its founder, Stephen

Quake, to 28×, for about $48,000 – the first human genome to be sequenced at the single

molecule level. [42]. The read length averaged 32 bases. However, as genomics expert Daniel

MacArthur reported on his blog, the sequence was “in many ways a disappointment,” with

very short reads, a relatively high error rate, and an untested ability to identify short “indels”

(insertions and deletions). [43]

nanopoRe sequenCingSeveral companies are working to commercialize sequencing strategies based on so-called

“nanopores.” This strategy sequences DNA by recording the electrical disturbance caused

as either an intact DNA molecule or individual bases pass through pores in a membrane or

polymer, across which an electrical current has been established. Some strategies thread the

DNA molecule through the pore intact. Others use an enzyme to cleave the bases one by

one and drop them into the pore. In either case, the idea is that each base, having a slightly

different shape, produces a different and characteristic disturbance in the normal flow of ions

through the pore.

Oxford Nanopore Technologies made waves at the 2012 Advances in Genome Biology and

Technology meeting as it unveiled two platforms, GridION™ and MinION™, based on nanopore

sequencing. (Xconomy.com writer Luke Timmerman called the announcement a “wowser

moment.” [44]) Both systems rely on a “proprietary processive enzyme” to thread individual DNA

molecules through custom protein pores. [45] According to Bio-IT World, the membrane itself is

a “synthetic proprietary polymer.” [46] The enzyme that “ratchets” the DNA through the pore has

not been disclosed, but it can move DNA at “upwards of 1,000 bases/second,” with a raw error

rate of 4%, although the company will be slowing that to 20–400 bases/second. [46]

GridION is a scalable system in which individual nodes, loaded with consumable cartridges,

perform up to 2,000 single-molecule reactions in parallel. (An 8,000-plus-pore consumable was

anticipated “in early 2013.” [45]). Multiple nodes can be operated in a cluster, which, according

to a company press release, can greatly accelerate the process. “A 20-node installation using an

8,000-nanopore configuration would be expected to deliver a complete human genome in 15

minutes.” [45] Furthermore, the company is developing what it calls a “disruptive” sequencing

strategy called “Run Until…” that will allow users to determine how much sequence they need.

The machines will collect and analyze data in real time until that goal is met. [45] MinION is a

USB thumb-drive-sized version of the system that is plugged into a computer via a USB port.

The company has announced pricing at “less than $900” per unit. [45]

According to GenomeWeb, both systems were “expected to deliver read lengths of up to 100 kb

with raw read error rates of about 1 percent,” and to launch commercially “during the second

half” of 2012 – news that pushed shares of other sequencing firms lower. [47]

As of fall 2013 Oxford has yet to actually deliver those products. However, in late October, the

company announced the November launch of an early-access “MinION Access Program,” a

Page 12: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 11Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

“substantial but initially controlled programme designed to give life science researchers access

to nanopore sequencing technology at no risk and minimal cost.” [110] The program requires a

$1,000 refundable deposit for the MinION device, flow cells, and software, with extra flow cells

available at $999 apiece.

Genia Corp. is also pursuing nanopores using a strategy called NanoTag sequencing. Genia’s

biological nanopores are tethered to DNA polymerases. As DNA is synthesized outside the

pore, base-specific “tags” are released down into the pore, producing a discrete electrical signal.

The strategy was demonstrated in a September 2012 study led by researchers at Columbia

University and the National Institute of Standards and Technology, in which each dNTP was

tagged with a different-sized form of polyethylene glycol. [48] Genia uses biological pores in

membrane arrays built atop CMOS microchips, with a single pore per sensor. The company’s in-

development consumable currently includes 264 sensors, but that could increase to 1 million

in the final product, with an anticipated speed of about 10 bases per second per sensor. [49]

The company has announced plans to ship initial beta versions by the end of 2013 and the

commercial launch is planned for some time in 2014. [50]

Other companies are also pursuing nanopores, including Nabsys, which detects not single

bases but rather hybridized segments of DNA as they pass through a pore. In July, Hitachi High-

Technologies and Base4 Innovation announced plans to collaborate on nanopore sequencing

as well. According to GenomeWeb Daily News, Base4 “has developed a method of increasing the

signal from the single molecule passing through the pore by using laser light enhanced by gold

structures. The technology … allows the signal to be read from unlabeled DNA, minimizing

sample preparation.” [51]

In a sign of the growing potential for this technology, on September 9, 2013, the NHGRI

announced $17 million in grant funding from its Advanced DNA Sequencing Technology

program to eight research teams pursuing nanopore sequencing. [52]

eleCTRon miCRosCopy ZS Genetics is developing a single-molecule sequencing strategy based on electron

microscopy. The strategy amplifies DNA to be sequenced using different heavy metal-labeled

nucleotides, which can then theoretically be distinguished in an electron microscope. ZS

Genetics published a proof-of-principle paper in October 2012 showing it could identify

thymine residues in a synthetic 3,272-base-pair piece of test DNA as well as the 7,248-base-

pair M13 viral genome using a mercury label, 5-MeHgS-dUTP. [53] According to In Sequence,

the company has since shown it can label “more than one base,” though not in the published

literature. [54] As reported by In Sequence, the company is now developing a prototype single-

molecule instrument with anticipated read lengths of 40 to 50 kb, and a commercial instrument

based on the technology could have a throughput of 500 Mb per hour. [54]

Page 13: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing12 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

III. CLINICAL APPLICATIONS OF WHOLE-GENOME AND EXOME SEQUENCING

With the advent of next-generation sequencing technologies, researchers and clinicians are

finally beginning to capitalize on the dream of personalized medicine that was a partial driver

of the Human Genome Project.

As described in a recent Chemical & Engineering News cover story, clinical applications of next-

gen sequencing range from targeted gene panels to exome sequencing to whole-genome

analysis. [1]

Ambry Genetics, for instance, offers a menu of panels such as BRCAplus, which covers six breast

cancer susceptibility genes (including BRCA1 and 2, which became available following the

landmark 2013 U.S. Supreme Court gene patent decision, Association for Molecular Pathology

v. Myriad Genetics [55]), and the XLID Next-Gen Panel™, which tests for mutations in 81 genes

associated with X-linked intellectual disability.

Foundation Medicine’s FoundationOne™ assay applies next-gen sequencing to cancer samples.

The company reports that “the test simultaneously sequences the entire coding sequence

of 236 cancer-related genes (3,769 exons) plus 47 introns from 19 genes often rearranged or

altered in cancer to an average depth of coverage of >250X.” [56] According to the New York

Times, FoundationOne costs about $5,800 and “is now used by more than 1,000 doctors … as

well as by 15 companies developing cancer drugs.” [57]

Ambry also offers two forms of exome sequencing, a 20,000-gene Clinical Diagnostic Exome,

which includes all human protein-coding genes, and a 4,000-gene subset that includes only

those genes whose relation to disease is understood.

Exome sequencing offers several advantages over both gene panels and whole-genome

sequencing. For one thing, by sequencing every protein-coding base in the genome, the

technology can be used to identify mutations in genes not yet implicated in a given disease –

a considerable advantage over pre-designed gene panels. At the same time, by confining its

analysis to only protein-coding genes, exome sequencing eliminates noncoding sequences

whose functional importance is unclear.

Several clinical studies have emphasized the power of exome sequencing to unravel medical

mysteries, and some have used the technology to guide medical care. That was the case with

Nicholas Volker, for instance.

As C&EN reported, Volker was a child who suffered from a “gastrointestinal disorder for which

physicians had not been able to pinpoint a cause. After multiple surgeries, they finally turned

Page 14: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 13Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

to whole-exome sequencing. The analysis revealed a variant in the XIAP gene that turned out

to cause an immune condition. The child was then cured with a bone marrow transplant.”

[1, 58] More recently, Stephen Kingsmore and colleagues at Children’s Mercy Hospital in

Kansas City, MO, used exome sequencing of two children to implicate yet another gene in

inflammatory bowel disease, IL10RA. Treatment of one of the affected children in the study with

hematopoietic stem cell transplantation produced “marked clinical improvement of all IBD-

associated clinical symptoms.” [59]

Another application of exome sequencing is the identification of genetic lesions underlying

rare Mendelian diseases. The earliest example of this approach occurred in 2009, when

researchers at the University of Washington pinpointed the genetic fault in Miller syndrome.

[60] More recently, researchers have used this approach to implicate gene errors in brain

malformations, intellectual disability, autism-spectrum disorders, and more. Jay Shendure,

who led the University of Washington team in 2009 and still uses exome sequencing today,

estimates the technique has helped identify “at least 100 new disease genes.” [61]

As useful as exome sequencing is, its strength is also a weakness. Because exome sequencing

eliminates noncoding genomic regions, it is incapable of identifying potentially interesting

lesions in those regions. Only about a quarter of patient exomes actually reveal a likely

causative mutation, [1] either because it isn’t clear how an identified variant could be

associated with a given phenotype, or because the variant lies outside known coding regions.

And of course, a considerable fraction of the genome is transcribed but not translated, yielding

either short or long noncoding RNAs, for instance.

As University of Washington researcher John Stamatoyannopolous and colleagues reported

in 2012, as part of the ENCODE project data release, “the majority (~93%) of disease- and

trait-associated variants emerging from these [genome-wide association] studies lie within

noncoding sequence, complicating their functional evaluation.” [62] The clinical significance of

those noncoding variants may not be obvious at present, but they could be eventually. Therein

lies one of the primary advantages of whole-genome sequencing: All the data is there. It can

always be filtered based on current knowledge and then reevaluated later in light of new

data – something that isn’t possible with gene panels or even exomes.

According to C&EN, the Medical College of Wisconsin – where Nicholas Volker’s exome

was sequenced – became in 2013 “the first lab to offer full-service clinical whole-genome

sequencing to patients worldwide.” [1] The service, with a 90-day turnaround time, costs

$17,000.

Illumina’s Clinical Services Laboratory offers its whole-genome sequencing service, called

TruSight, for around $10,000 per sample with a 60–90 day turnaround time and 30× coverage.

A two-week turnaround option is also available, for about $12,000 to $13,000. [63] However,

the current speed leader in genome sequencing must be Kansas City’s Children’s Mercy

Page 15: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing14 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

Hospital, which offers an ultrafast service called STAT-Seq for about $13,500. [64] Developed

by the hospital’s Stephen Kingsmore, STAT-Seq takes advantage of Illumina’s HiSeq 2500,

which in “rapid run mode” can produce 120 Gb of 2×100 reads in just 27 hours. That speed,

combined with bioinformatic filtering algorithms that use the patient’s symptoms to narrow

down likely disease genes, means the process can identify potential causative alleles in just 50

hours – a window that makes the technology potentially useful for acute neonatal healthcare.

[64] The lab has processed at least seven STAT-Seq families to date [65], including that of

Millie McWilliams, who “is one of fewer than 10 people in the world to be diagnosed with the

mutation in the gene ASXL3.” [66]

In September 2013, the NIH announced that Kingsmore’s group was one of four being funded

under the new Genomic Sequencing and Newborn Screening Disorders program, which

will “study technical, clinical, and ethical aspects of genomics research in newborns, and its

potential to improve newborn healthcare.” [67] The program is slated to allocate up to $25

million over five years, including $5 million in 2013 to Kingsmore at Children’s Mercy Hospital,

plus groups at the University of California, San Francisco; Brigham and Women’s Hospital and

Boston Children’s Hospital; and the University of North Carolina. “The four pilot projects will

examine whether sequencing newborns’ genomes or exomes can provide useful medical

information beyond what is already delivered by current newborn screening,” GenomeWeb

reported. [67]

IV. RESEARCH APPLICATIONS OF NEXT-GEN SEQUENCING

Next-gen sequencing is also a valuable tool in the research laboratory, where the process is

used for genome sequencing, gene expression analysis, epigenetic studies, probing microbial

communities, and more.

genome sequenCingOne obvious application of next-gen sequencing is de novo sequencing – that is, sequencing

and assembly of a genome for which no reference genome already exists. The Genome

10K Project, for instance, “aims to assemble a genomic zoo – a collection of DNA sequences

representing the genomes of 10,000 vertebrate species, approximately one for every vertebrate

genus.” [68]

Just as importantly, though, researchers use next-gen strategies to assess genetic variation

within a population. When sequencing the exome of a diseased individual, for instance,

geneticists need to be able to determine whether a given difference between the patient’s DNA

and a reference is significant. To do that, researchers must map the extent of normal genetic

variation.

Page 16: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 15Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

The 1000 Genomes Project presents a first stab at that problem. The project’s goal, according to

the NHGRI, which funded the venture, “is to provide a resource of almost all variants, including

SNPs and structural variants, and their haplotype contexts. This resource will allow genome-

wide association studies to focus on almost all variants that exist in regions found to be

associated with disease.” [69] In 2012 the project released genome datasets from 1,092 human

individuals representing 14 populations and compiled using both whole genome and exome

sequencing. [70]

TRansCRipTomiCsAnother popular research application is transcriptomics, an application commonly called “RNA-

Seq.” Researchers convert their RNA sample – for instance, total cellular RNA, isolated mRNA,

or enriched short RNAs (e.g., microRNAs) – to cDNA, then use that to construct a sequencing

library. The resulting data reveal not just the identity of expressed transcripts but their

abundance, as well, making RNA-Seq an alternative to DNA microarrays for gene expression

analysis.

RNA-Seq offers some substantial advantages over microarrays. [71] For instance, it can quantify

any transcript, not just those identified in advance and placed on an array. (In that sense,

microarrays are said to be “biased.”) RNA-Seq can also identify novel transcript isoforms and

can be applied to any organism, regardless of whether or not a company has developed a

commercial microarray for it. One 2013 study used RNA-Seq data to explore the phenotypic

differences between social “castes” of paper wasps, including workers and queens. [72]

epigenomiCsResearchers have also devised strategies for probing the epigenetic landscape with next-gen

sequencing. ChIP-Seq uses the sequencer to identify genomic regions associated with specific

histone modifications or DNA-binding proteins. “ChIP” is chromatin immunoprecipitation, a

technique that uses antibodies to pull down DNA sequences associated with specific histone

modifications or transcription factors. Traditionally, ChIP samples are probed either using PCR

or microarrays, but ChIP-Seq allows a comprehensive, unbiased look across the genome, and

the technique was used extensively in both the Encyclopedia of DNA Elements (ENCODE) [73]

and NIH Roadmap Epigenomics [74] projects.

Researchers can also use next-gen sequencers, combined with sodium bisulfite treatment, to

probe DNA methylation, an application called MethylC-Seq. [75] (Sodium bisulfite converts

unmodified cytosine nucleotides to uracil while leaving methylated positions unchanged.)

More recently, researchers have developed methods to differentiate the traditional

eukaryotic form of DNA methylation, 5-methylcytosine, from its newly discovered cousin,

5-hydroxymethylcytosine (5-hmC). Bisulfite sequencing cannot distinguish the two forms, but

researchers at the Ludwig Institute for Cancer Research in San Diego, the University of Chicago,

and Emory University School of Medicine, developed one method that can. Called TAB-Seq

(TET-assisted bisulfite sequencing), the method first protects 5-hmC residues by treatment

Page 17: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing16 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

with beta-glucosyltransferase, which adds a glucose residue. The remaining modified bases are

then converted by a TET enzyme to 5-carboxycytosine, which sodium bisulfite then converts

to uracil. [76] By sequencing the resulting DNA and comparing with DNA treated only with

sodium bisulfite, researchers can distinguish the two methylation events from one another.

Still other researchers use next-gen technologies to probe chromatin structure. One technique,

called ChIA-PET (chromatin interaction analysis with paired-end-tag sequencing), locks

long-range interactions between transcription factors and gene control sequences with

formaldehyde, and then captures those structures by ChIP. Sequencing the resulting DNA

identifies distal sequences that are transiently joined by protein-protein interactions through

the estrogen receptor, for example. [77]

paleogenomiCsPaleogenomics refers to the genetic analysis of extinct species and ancient samples, which can

help fuel evolutionary studies and trace genetic relationships between extant species and their

ancient predecessors.

Among other applications, researchers have used next-gen sequencing to produce genomic

sequences for the Neandertal [78] and woolly mammoth [79], as well as the mitochondrial

genome of a 5,300-year-old Neolithic mummy. [80] In 2009, researchers at Penn State University

used the mammoth genome to map its relationship with modern mammals using the extinct

animal’s transposable elements. [81]

meTagenomiCsMetagenomics (sometimes called “environmental genomics”) enables researchers to probe

microbial populations that cannot be grown and isolated in the lab. The technique sequences

the bulk DNA of an entire microbial population to understand its metabolic capacity – its

ability to utilize particular nutrients, for example. Alternatively, researchers can fingerprint the

bacterial genomes to identify and catalog the microbial populations that might be present in a

given sample, as well as their dynamics.

J. Craig Venter famously cataloged the metagenome of ocean bacteria as he sailed around

the world on the Sorcerer II [82], but researchers also use these techniques to probe normal

and diseased microbial populations of humans and other model organisms (the so-called

microbiome), and their interactions with their hosts. The NIH Common Fund’s five-year, $153

million Human Microbiome Project described in 2012 the normal microbiome from 242 healthy

volunteers from the mouth, nose, skin, GI tract, and vagina. [83] On September 9, 2013, the NIH

announced another $22 million in Phase 2 funding for the project. [84]

single-Cell genomiCs Most next-gen sequencing strategies work on DNA pools from hundreds of thousands or even

millions of cells. Therefore they represent snapshots of what the bulk population is doing.

Page 18: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 17Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

Sometimes, though, researchers want to look at individual cells to study cell-to-cell differences

in gene expression or genetic makeup.

In a testament to the potential power of this approach, the NIH in 2012 announced it was

earmarking $90 million in funding over five years “to accelerate the development and

application of single cell analysis across a variety of fields.” [85] The goal, the NIH said, “is to

understand what makes individual cells unique and to pave the way for medical treatments

that are based on disease mechanisms at the cellular level.” Single-cell genomics methods use

microfluidics, flow cytometry, or micromanipulation to isolate single cells, which are then lysed,

amplified, and sequenced. By enabling researchers to visualize cellular diversity that might be

lost in the noise of population studies, these techniques can shed light on such processes as

stem cell differentiation, development, and disease.

But, as reported in Nature in 2012, “challenges abound.” [86] “Amplifying the tiny amount of

DNA in a single cell until there is enough to sequence without introducing too many errors is

still difficult…. The bioinformatics required to stitch the data together and deal with artefacts

can be fiendishly complicated. And … even isolating a cell can be tough.” [86] Add to that

the fact that, when sequencing a single cell, there effectively is no control – that is, each

cell is an experiment of n = 1, with no opportunity for repetition – and it’s clear that single-

cell techniques are not for the faint of heart. Still, substantial progress has been made. In

December 2012 Harvard University researcher Sunney Xie and colleagues demonstrated a

new linear genome amplification method called MALBAC (multiple annealing and looping-

based amplification cycles) [87], which the authors claim produces more uniform amplification

than PCR-based methods. When applied to individual SW480 cancer cells, the sequence was

sufficiently complete that the team was able to call both single-nucleotide polymorphisms

and copy-number variations in the genome. The team applied the same approach in a

companion study to individual sperm cells. [88] Now others are applying similar approaches

to the transcriptome, too. One August 2013 study used single-cell transcriptome analyses to

probe gene-expression patterns during the early developmental stages of human and mouse

embryos. [89]

V. COMPUTATIONAL CHALLENGES

For all its power, next-generation DNA sequencing presents a substantial computational

burden. As one 2012 review put it, “Bioinformatics is the single largest bottleneck in the

routine implementation of next generation sequencing in clinical practice at the current

time. A general guideline is that every dollar spent on sequencing hardware must be

matched by a comparable investment in informatics.” [90] For one thing, there’s a basic

infrastructure issue: Few labs outside large genome centers have the computational muscle

Page 19: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing18 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

necessary to store, transfer, and manipulate datasets many hundreds of gigabytes or even

terabytes in size. An Illumina HiSeq 2500 with two flow cells can produce up to six billion

paired-end reads, 600 Gb total, in just a single run, and the computational power needed

to store, copy, and work with them is considerable. (Some researchers opt for cloud-based

storage, such as on Amazon’s cloud, to circumvent at least part of this problem. [91])

Another issue is basic computational know-how. Sequence instrumentation vendors often

provide their own software tools, and dedicated hardware/software solutions like Knome’s

$125,000 knoSYS™ 100 are available. So too are cloud-based alternatives, such as DNAnexus

and Illumina’s BaseSpace. But many researchers prefer free, open-source Unix (or Linux)

applications that eschew pretty graphical user interfaces for command-line flexibility. These

programs essentially parse, filter, and manipulate sequence files in one format and output

them in another, and some computational skill is required to optimize them and string them

together in what is commonly called a “pipeline.” [90] An entire universe of bioinformatics

tools exists (see, for example, http://seqanswers.com/wiki/Software/list) to handle different

aspects of these pipelines, such as read mapping (mapping individual reads to a reference

genome), alignment, assembly, and variant calling, and users may need to test different

combinations to find those that work best for them. But users may also need to code their

own software, especially when designing novel algorithms or sequencing strategies. [90, 91]

Beyond basic data manipulation issues, there’s also the problem of interpretation. Any given

individual’s genome sequence will contain millions of differences from a reference. When

trying to identify clinically or phenotypically significant variants, many can be eliminated

computationally, but somebody still often needs to sit down with the list that remains and

determine what is and is not important. That can be a time-consuming and expensive task

that, at a minimum, requires extensive biological and computational expertise and may or

not require additional bench work – such as Sanger sequencing validation and functional

studies in cell culture – to sort out. [65]

Elaine Mardis, codirector of the Genome Institute at Washington University, summed up the

problem in a 2010 essay entitled, “The $1,000 genome, the $100,000 analysis?” Noting that

many of the then-known examples of clinical whole-genome sequencing involved large

teams of specialists to pore over the resulting data, Mardis wrote:

At the end of the day, although the idea of clinical whole-genome sequencing for diagnosis

is exciting and potentially life-changing for these patients, one does wonder how, in the

clinical translation required for this practice to become commonplace, such a ‘dream team’

of specialists would be assembled for each case. In other words, even if the cost and speed of

generating sequencing data continue their precipitous decreases, the cost of ‘team’ analysis

seems unlikely to immediately follow suit. [92]

Page 20: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 19Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

But it’s also true that each new case adds to the database of information, meaning per-case

analysis costs should decrease. [65]

Still, given these and other variables, it’s not surprising that some researchers choose to

outsource their next-gen sequencing and informatics needs altogether to companies

such as Knome, whose knomeDISCOVERY™ service provides “end-to-end” sequencing and

interpretation.[93]

VI. SYNTHETIC BIOLOGY & GENOME EDITING

The ongoing ‘omics revolution isn’t only about being able to read DNA sequences. Increasingly,

researchers also are developing and using methods that allow them to rewrite those sequences,

as well.

This process can take either of two fundamental forms. The first is synthetic biology, a discipline

that aims to apply engineering principles to biological problems, thereby enabling researchers to

design and implement biological circuits from prefabricated genetic components with the ease,

predictability, and reproducibility of their electrical counterparts. [94] Synthetic biologist James

Collins of Boston University likens the process to building with Lego blocks [95], and companies

like LS9 and Synthetic Genomics are pursuing commercial applications of this approach for

biofuel development, among others.

Synthetic biologists envision developing strains that can, for example, produce biomedically

or economically useful metabolites or fluoresce when particular toxins are present in the

environment, and indeed, it’s relatively easy to design such circuits – on paper. Getting the circuits

to work as intended, however, is another matter. Genetic elements, such as promoters, that work

well in one context will not necessarily function in another, and the mere act of linking two pieces

of DNA can introduce new and unanticipated regulatory elements and secondary structural

features that can confound results.

Researchers like Drew Endy at Stanford University and Christopher Voigt at the Massachusetts

Institute of Technology are assembling and characterizing genetic parts for just this purpose (e.g.,

biobricks.org and partsregistry.org). Voigt’s team recently characterized some 582 transcriptional

terminator sequences, identifying structural elements that control each element’s strength. [96]

Two other teams, one co-lead by Endy, tackled transcriptional start elements. [97, 98]

Other synthetic biologists are developing reagent and software design tools to facilitate circuit

design, “print” the output as DNA, and more. [94] Still others, like J. Craig Venter, at the J. Craig

Venter Institute, and Jef Boeke of Johns Hopkins University, have demonstrated methods for

Page 21: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing20 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

building genomes (or parts of genomes) from scratch in bacteria [99] and yeast [100], respectively

– approaches that could ultimately incorporate new genetic circuitry as part of their control

architectures. A second approach is genome editing, in which researchers use genetic tools to

make smaller scale but defined modifications in eukaryotic genomes. For example, geneticists

studying an inherited condition might want to repair that genetic lesion in the patient’s own cells

for therapeutic purposes.

Initially, zinc finger nucleases (ZFNs), a technology owned by Sangamo Biosciences and licensed

for research applications to Sigma-Aldrich (and, as of early September, Horizon Discovery), were

the go-to option for this kind of work. ZFNs are genetically engineered transcription factor

endonucleases that induce a double-stranded DNA break in genomic DNA at a user-specified

locus – essentially a large restriction enzyme. That break is then repaired using either non-

homologous end-joining, which often induces frame-shift or missense mutations, or homology-

directed repair, a process that enables users to insert their own sequences. [101] ZFNs are

programmable – each zinc finger domain recognizes a specific three-nucleotide sequence. By

selecting the appropriate set of fingers, researchers can target nearly any sequence they desire. In

the lab, this approach allows researchers to make rapid gene knockout and knock-in mutations,

even in model organisms lacking the more robust genetic toolset of, say, mice. But there also are

obvious clinical applications, and Sangamo has several ZFN-based therapeutics in development,

the most advanced of which is SB-728, a ZFN that targets the CCR5 receptor for HIV, currently in

Phase 2 clinical trials. [102]

More recently, a similar but more easily designed and lower-cost alternative technology to ZFNs

has emerged, called TALENs. [103] TALENs are built from arrays of protein modules, each of which

targets a single (as opposed to three) nucleotide.

In June 2012, researchers at the University of California, Berkeley, and Umeå University in Sweden,

discovered a new system called CRISPR/Cas. [104] Based on a bacterial defense mechanism,

CRISPR/Cas uses a generic nuclease (Cas) and short guide RNA to induce a double-stranded break

in genomic DNA at a sequence complementary to the guide. The process is simpler to design

than ZFNs or TALENs; all users need do is prepare a guide RNA to match their intended target. It

is also multiplexable. In May 2013, researchers made five simultaneous genetic modifications in

mouse embryonic stem cells by adding five different guide sequences. [105] Still unresolved is the

system’s specificity – that is, how many off-target changes does CRISPR/Cas induce – but already

workarounds are being developed. One approach, published by Church, uses a single-strand-

cutting “nickase” variant of the Cas nuclease and two guide molecules to boost specificity. [106]

Despite the uncertainty, researchers are finding ways to tweak the CRISPR/Cas system for more

than genome editing. For instance, some have modified the Cas protein to eliminate its nuclease

activity and then coupled it to transcriptional activator and repressor domains to tweak gene

expression in a sequence-specific manner. [104] (Similar approaches have been used with TALENs,

coupling regulatory protein modules to the TALEN’s DNA-binding domain. [103]) Others are

Page 22: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 21Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

exploring the technology’s therapeutic potential. [104]. Indeed, as Rodolphe Barrangou, of North

Carolina State University, said in a recent essay by science writer Elizabeth Pennisi, “The only

limitation today is people’s ability to think of creative ways to harness [CRISPR].” [104]

VII. CONCLUSIONS

In the decade since the official conclusion of the Human Genome Project, the genomics

landscape has evolved to be almost unrecognizable. Individual genomes used to be celebrated

achievements. Today, new genomes are sequenced so frequently as to be almost unremarkable.

Indeed, Illumina announced that its FastTrack Services business “achieved record shipments in

the first quarter” of 2013, “ship[ping] more than 4,000 [whole human] genomes,” including its

10,000th genome overall. [107]

A PubMed search for “next-generation sequencing” retrieves 4,900-plus papers, all but about 100

since 2009. The technology has matured to the point where it is secondary to its application. Want

to understand the evolution of a particular organism? Sequence it and its close relatives and let

the data sort it out. Need to identify the genetics underlying a rare disease? Skip the family trees

and just sequence mother, father, and child. It no longer sounds far-fetched to suggest children

may one day have their genomes sequenced as part of their routine early childhood testing, and

with sequencing costs continuing to fall, point-of-care sequence analysis may soon be the rule

rather than the exception.

Still, the field is also very much in its adolescence. Though “this-gen” technologies are well

established, “third-gen” sequencing technologies (with the exception of Pacific Biosciences) have

yet to break through. Researchers are still working out the intricacies of data handling, quality

assessment, analysis, and interpretation. Says Harvard geneticist George Church, “It’s hard to wrap

your mind around a million-fold change in technology that arrives in a six-year period.” [108]

There’s no doubt about that. But that kind of hyperaccelerated growth also means the research

community can anticipate extraordinary opportunities in basic research, diagnostics R&D,

and therapeutics – and the dollars to match. Witness the $50 million NHGRI has distributed

in Advanced Sequencing Technology Awards just since 2011. [109] The challenge – and the

opportunity – is finding clever and creative ways to harness the technology. If the successes of the

past decade are any indication, the research community should have no trouble rising to meet

that challenge.

Page 23: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing22 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

VIII. References

[1] Arnaud, C. H. “Next-Generation DNA Sequencing Finds Use As A Diagnostic,” Chemical &

Engineering News, 91[28]:10–13, July 15, 2013.

[2] Wetterstrand, K. A. “DNA Sequencing Costs: Data from the NHGRI Genome Sequencing

Program (GSP)” from the National Human Genome Research Institute (NIH), available at

http://www.genome.gov/sequencingcosts/; accessed Aug. 27, 2013.

[3] Pooled Exome Analysis,” from the Yale Center for Genome Analysis, available at http://ycga.

yale.edu/sequencing/Illumina/pooled.aspx ; accessed Sept. 6, 2013.

[4] Diamandis, P. “Outpaced by innovation: Canceling an XPRIZE,” from the Huffington Post,

Aug. 22, 2013, available at http://www.huffingtonpost.com/peter-diamandis/outpaced-by-

innovation-ca_b_3795710.html.

[5] XPRIZE web site, available at www.xprize.org; accessed Sept. 11, 2013.

[6] Perkel, J. M. “Who wants the X Prize?” The Scientist, 20[12]:65, December 2006.

[7] Brown, E. “Organizers cancel Archon Genomics X-Prize,” Los Angeles Times, Aug. 24, 2013.

[8] Perkel, J. M. “Charting the course,” The Scientist, 25[10]:70, October 2011.

[9] “BRACAnalysis Technical Specifications,” from Myriad Genetic Laboratories, April 2012;

available at http://www.myriad.com/lib/technical-specifications/BRACAnalysis-Technical-

Specifications.pdf.

[10] Product description and specifications for 3730xl DNA Analyzer, from Applied Biosystems,

available at http://products.invitrogen.com/ivgn/product/3730XL; accessed Sept. 5, 2013.

[11] Margulies, M., et al. “Genome sequencing in microfabricated high-density picolitre

reactors,” Nature, 437:376–80, Sept. 15, 2005.

[12] Shendure, J., et al. “Accurate multiplex polony sequencing of an evolved bacterial genome,”

Science, 309:1728–32, Sept. 9, 2005.

[13] Mardis, E. R. “Next-generation DNA sequencing methods,” Annu. Rev. Genomics Hum. Genet.,

9:387–402, 2008.

[14] Product description and specifications, GS FLX+ System, from 454 Life Sciences, available

at http://454.com/products/gs-flx-system/index.asp; accessed Sept. 5, 2013.

[15] Product description and specifications, GS Junior Instrument & Workflow, from 454 Life

Sciences, available at http://www.gsjunior.com/instrument-workflow.php; accessed Sept. 6,

2013.

[16] Roche Diagnostics, personal communication, Sept. 6, 2013.

[17] “Roche shutting down 454 sequencing business,” GenomeWeb Daily News, Oct. 15, 2013.

[http://www.genomeweb.com/sequencing/roche-shutting-down-454-sequencing-

business]

Page 24: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 23Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

[18] Product description and system performance parameters, HiSeq 2500/1500, from

Illumina, available at http://www.illumina.com/systems/hiseq_2500_1500/performance_

specifications.ilmn; accessed Sept. 11, 2013.

[19] Gnerre, S., et al. “High-quality draft assemblies of mammalian genomes from massively

parallel sequence data,” Proc. Natl. Acad. Sci. USA, 108:1513–8, Jan. 25, 2011.

[20] Product description and specifications, MiSeq Benchtop Sequencer, from Illumina,

available at http://www.illumina.com/systems/miseq/performance_specifications.ilmn;

accessed Sept. 11, 2013.

[21] Lakdawalla, A. “Perspectives from the Illumina User Group Meeting,” posted Feb. 16, 2012

on the Illumina web site, available at http://www.illumina.com/company/recent_events/

agbt_2012/perspectives_illumina_user_meeting.ilmn.

[22] System parameters, Moleculo technology, from Illumina, available at http://www.illumina.

com/technology/moleculo-technology.ilmn; accessed Sept. 11, 2013.

[23] Davies, K. “Moleculo Man: Mickey Kertesz on Illumina’s sub-assembly acquisition,” Bio-IT

World, Jan. 18, 2013; available at http://www.bio-itworld.com/2013/1/18/moleculo-man-

mickey-kertesz-illumina-long-read-acquisition.html.

[24] “Illumina announces phasing analysis service for human whole-genome sequencing,”

news release, July 18, 2013; available at http://investor.illumina.com/phoenix.

zhtml?c=121127&p=irol-newsArticle&id=1838779.

[25] Specifications sheet, 5500 W Series Genetic Analyzers, V2.0, from Applied Biosystems, Life

Technologies; available at http://tools.lifetechnologies.com/content/sfs/brochures/5500-w-

series-spec-sheet.pdf.

[26] Perkel, J. M. “Making contact with sequencing’s fourth generation,” BioTechniques, 50[2]:93–

5, February 2011.

[27] Overview, Ion Torrent™ Next-Generation Sequencing Technology,” from Life Technologies,

available at http://www.lifetechnologies.com/us/en/home/life-science/sequencing/

next-generation-sequencing/ion-torrent-next-generation-sequencing-technology.html;

accessed Sept. 5, 2013.

[28] Specification sheet, Ion PGM™ System, from Life Technologies, available at http://tools.

lifetechnologies.com/content/sfs/brochures/PGM-Specification-Sheet.pdf

[29] System specifications, Ion Proton™ System, from Life Technologies, available at http://www.

lifetechnologies.com/us/en/home/life-science/sequencing/next-generation-sequencing/

ion-torrent-next-generation-sequencing-workflow/ion-torrent-next-generation-

sequencing-run-sequence/ion-proton-system-for-next-generation-sequencing/ion-

proton-system-specifications.html; accessed Sept. 5, 2013.

[30] Technology overview, from Complete Genomics, available at http://www.

completegenomics.com/technology/; accessed on Sept. 11, 2013

[31] Drmanac, R., et al. “Human genome sequencing using unchained base reads on self-

assembling DNA nanoarrays,” Science, 327:78–81, Jan. 1, 2010.

Page 25: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing24 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

[32] “BGI-Shenzhen completes acquisition of Complete Genomics,” news release, Mar. 18,

2013; available at http://www.completegenomics.com/news-events/press-releases/BGI-

Shenzhen-Completes-Acquisition-of-Complete-Genomics-198854331.html.

[33] Flinn, R., and Vance, A. “Complete Genomics: Chinese bid sparks a security fight,”

Bloomberg Businessweek, Dec. 20, 2012; available at http://www.businessweek.com/

articles/2012-12-20/complete-genomics-chinese-bid-sparks-a-security-fight.

[34] McBride, R. “BGI clears hurdle in buyout of Complete Genomics,” Fierce Biotech, Jan. 1, 2013;

available at http://www.fiercebiotech.com/story/bgi-clears-hurdle-buyout-complete-

genomics/2013-01-01.

[35] Eid, J., et al. “Real-time DNA sequencing from single polymerase molecules,” Science,

323:133–8, Jan. 2, 2009.

[36] “SMRT Sequencing Advantage,” from Pacific Biosciences, available at http://www.

pacificbiosciences.com/products/smrt-technology/smrt-sequencing-advantage/; accessed

Sept. 11, 2013.

[37] “Base Modification Detection,” from Pacific Biosciences, available at http://www.

pacificbiosciences.com/applications/base_modification/; accessed Sept. 11, 2013.

[38] Chin, C.-S., et al. “The origin of the Haitian cholera outbreak strain,” New Engl. J. Med.,

364:33–42, Jan. 6, 2011.

[39] Rasko, D. A., et al. “Origins of the E. coli strain causing an outbreak of hemolytic-uremic

syndrome in Germany,” New Engl. J. Med., 365:709–17, Aug. 25, 2011.

[40] “Helicos BioSciences files for Chapter 11 bankruptcy protection,” GenomeWeb Daily News,

Nov. 16, 2012; available at http://www.genomeweb.com/sequencing/helicos-biosciences-

files-chapter-11-bankruptcy-protection.

[41] Harris, T. D., et al. “Single-molecule DNA sequencing of a viral genome,” Science, 320:106–9,

April 4, 2008.

[42] Pushkarev, D., Neff, N. F., and Quake, S. R.“Single-molecule sequencing of an individual

human genome,” Nat. Biotechnol., 27:847–50, Sept. 2009.

[43] MacArthur, D. “Helicos co-founder sequences own genome using single-molecule

technology,” Wired Science blog, Aug. 10, 2009; available at http://www.wired.com/

wiredscience/2009/08/helicos-co-founder-sequences-own-genome-using-single-

molecule-technology/.

[44] Timmerman, L. “Oh, and one more thing: A wowser moment in DNA sequencing,” Xconomy.

com, Feb. 21, 2012; available at http://www.xconomy.com/national/2012/02/21/oh-and-

one-more-thing-a-wowser-moment-in-dna-sequencing/?single_page=true.

[45] “Oxford Nanopore introduces DNA ‘strand sequencing’ on the high-throughput GridION

platform and presents MinION, a sequencer the size of a USB memory stick,” news release,

Feb. 17, 2012; available at http://www.nanoporetech.com/news/press-releases/view/39.

Page 26: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 25Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

[46] Davies, K. “Oxford strikes first in DNA sequencing nanopore wars,” Bio-IT World, Feb. 17,

2012; available at http://www.bio-itworld.com/news/02/17/12/Oxford-strikes-first-in-DNA-

sequencing-nanopore-wars.html.

[47] “Shares of sequencing instrument makers fall as Oxford Nanopore unveils platforms,”

GenomeWeb Daily News, Feb. 17, 2012; available at http://www.genomeweb.com/

sequencing/shares-sequencing-instrument-makers-fall-oxford-nanopore-unveils-

platforms.

[48] Kumar, S., et al. “PEG-labeled nucleotides and nanopore detection for single molecule DNA

sequencing by synthesis,” Scientific Reports, 2:684, DOI:10.1038/srep00684, Sept. 21, 2012.

[49] Perkel, J. M. “Next Generation Sequencing 2013: Looking into Genomes,” Biocompare.com,

Jan. 29, 2013; available at http://www.biocompare.com/Editorial-Articles/126329-Next-

Gen-Sequencing/.

[50] “Genia Technologies collaborates with Professors Jingyue Ju at Columbia and George

Church at Harvard to develop a nanopore-based sequencing platform that will enable the

use of molecular diagnostics in everyday clinical care,” news release, Oct. 3, 2012; available

at http://www.geniachip.com/about/press-releases/10-3-2012/.

[51] “Base4, Hitachi developing single-molecule, nanopore-based sequencer,” GenomeWeb

Daily News, July 8, 2013; available at http://www.genomeweb.com/sequencing/base4-

hitachi-developing-single-molecule-nanopore-based-sequencer.

[52] “NHGRI’s $17M DNA sequencing program to focus on nanopores,” GenomeWeb Daily News,

Sept. 9, 2013; available at http://www.genomeweb.com//node/1277236.

[53] Bell, D. C., et al. “DNA base identification by electron microscopy,” Microscopy and

Microanalysis, 18:1049–53, Oct. 2012.

[54] Karow, J. “After raising $3.5M, ZS Genetics expands, starts work on prototype electron

microscope sequencer,” In Sequence, July 16, 2013; available at http://www.genomeweb.

com/sequencing/after-raising-35m-zs-genetics-expands-starts-work-prototype-electron-

microscope.

[55] Association for Molecular Pathology v. Myriad Genetics Inc., case 12-398, U.S. Supreme Court,

available at http://www.supremecourt.gov/opinions/12pdf/12-398_1b7d.pdf.

[56] Technical Information and Test Overview, FoundationOne, from Foundation Medicine,

available at http://www.foundationone.com/about-foundationone/ONE-I-002-20130529_

FoundationOne_Technical.pdf; accessed Sept. 5, 2013.

[57] Eisenberg, A. “Variations on a gene, and tools to find them,” New York Times, p. BU3, April

28, 2013; available at http://www.nytimes.com/2013/04/28/business/in-cancer-treatment-

new-dna-tools.html.

[58] Worthey, E. A., et al. “Making a definitive diagnosis: Successful clinical application of whole

exome sequencing in a child with intractable inflammatory bowel disease,” Genet Med,

13:255–62, March 2011.

Page 27: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing26 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

[59] Dinwiddie, D. L., et al. “Molecular diagnosis of infantile onset inflammatory bowel disease

by exome sequencing,” Genomics, DOI:10.1016/j.ygeno.2013.08.008, Aug. 31, 2013.

[60] Ng, S. B., et al. “Exome sequencing identifies the cause of a mendelian disorder,” Nat. Genet.,

42:30–5, Jan. 2010.

[61] Perkel, J.M., “Exome sequencing: Toward an interpretable genome,” Science, 342:262–4, Oct.

11, 2013.

[62] Maurano, M. T., et al. “Systematic localization of common disease-associated variation in

regulatory DNA,” Science, 337:1190–5, Sept. 7, 2012.

[63] “How to Order,” process description for physicians, from Illumina, available at http://www.

illumina.com/clinical/illumina_clinical_laboratory/igs_for_doctors/how_to_order.ilmn;

accessed Sept. 6, 2013.

[64] Heger, M. “STAT-Seq proof-of-principle study shows way forward for neonatal WGS in

clinic,” Clinical Sequencing News, Oct. 3, 2012; available at http://www.genomeweb.com/

sequencing/stat-seq-proof-principle-study-shows-way-forward-neonatal-wgs-clinic.

[65] Perkel, J. M. “Finding the true $1000 genome,” BioTechniques, 54[2]:71–4, February 2013.

[66] McKean, M. L. “Children’s Mercy receives grant for early diagnosis in critically ill newborns,”

FOX 4 Kansas City , Sept. 4, 2013; available at http://fox4kc.com/2013/09/04/childrens-

mercy-receives-grant-for-early-diagnosis-in-critically-ill-newborns/.

[67] “NIH awards up to $25M over five years to teams testing genome sequencing in newborn

screening,” GenomeWeb Daily News, Sept. 4, 2013; available at http://www.genomeweb.

com/sequencing/nih-awards-25m-over-five-years-teams-testing-genome-sequencing-

newborn-screening.

[68] “Genome 10K Project,” overview, available at https://genome10k.soe.ucsc.edu/; accessed

Sept. 6, 2013.

[69] “1000 Genomes Project,” from the U.S. National Human Genome Research Institute,

available at http://www.genome.gov/27528684; accessed Sept. 6, 2013.

[70] 1000 Genomes Project Consortium, et al. “An integrated map of genetic variation from

1,092 human genomes,” Nature, 491:56–65, Nov. 1, 2012.

[71] Perkel, J. M. “Transcriptome analysis using RNA-Seq,” Biocompare.com, Sept. 4, 2012;

available at http://www.biocompare.com/Editorial-Articles/117889-RNA-Seq-Whole-

Transcriptome-Sequencing/.

[72] “Wasp transcriptome analysis hints at evolution of social behavior,” GenomeWeb Daily News,

Feb. 26, 2013; available at http://www.genomeweb.com/sequencing/wasp-transcriptome-

analysis-hints-evolution-social-behavior.

[73] “The ENCODE Project: ENCyclopedia OF DNA Elements,” from the U.S. National Human

Genome Research Institute, available at http://www.genome.gov/10005107; accessed

Sept. 6, 2013.

Page 28: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing 27Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

[74] “NIH Roadmap Epigenomics Mapping Consortium,” web page available at http://www.

roadmapepigenomics.org/; accessed Sept. 6, 2013.

[75] Perkel, J. M. “Epigenomics: The new technologies of chromatin analysis,” Science, 338:546–8,

Oct. 26, 2012; available at http://www.sciencemag.org/site/products/lst_20121026.xhtml.

[76] Yu, M., et al. “Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian

genome,” Cell, 149:1368–80, June 8, 2012.

[77] Fullwood, M. J., et al. “An oestrogen-receptor-α-bound human chromatin interactome,”

Nature, 462:58–64, Nov. 5, 2009.

[78] Green, R. E., et al. “A draft sequence of the Neandertal genome,” Science, 328:710–22, May 7,

2010.

[79] Miller, W., et al. “Sequencing the nuclear genome of the extinct woolly mammoth,” Nature,

456:387–90, Nov. 20, 2008.

[80] Ermini, L., et al. “Complete mitochondrial genome sequence of the Tyrolean iceman,” Curr.

Biol., 18:1687–93, Nov. 11, 2008.

[81] Zhao, F.,Qi, J., and Schuster, S. C. “Tracking the past: Interspersed repeats in the extinct

Afrotherian mammal, Mammuthus primigenius,” Genome Res., 19:1384–92, Aug. 2009.

[82] Yooseph, S., et al. “The Sorcerer II Global Ocean Sampling expedition: Expanding the

universe of protein families,” PLoS Biol, 5:e16, March 2007.

[83] “NIH Human Microbiome Project defines normal bacterial makeup of the body,” news

release, June 13, 2012, from National Human Genome Research Institute; available at

http://www.genome.gov/27549144.

[84] “NIH awards $22M for Human Microbiome Project’s second phase,” GenomeWeb Daily

News, Sept. 9, 2013; available at http://www.genomeweb.com//node/1277421.

[85] “NIH Common Fund announces awards for single cell analysis,” news release, Oct. 15, 2012,

from National Institutes of Health; available at http://www.nih.gov/news/health/oct2012/

nibib-15.htm.

[86] Owens, B. “Genomics: The single life,” Nature, 491:27–29, Nov. 1, 2012.

[87] Zong, C., et al. “Genome-wide detection of single-nucleotide and copy-number variations

of a single human cell,” Science, 338:1622–6, Dec. 21, 2012.

[88] Lu, S., et al. “Probing meiotic recombination and aneuploidy of single sperm cells by whole-

genome sequencing,” Science, 338:1627–30, Dec. 21, 2012.

[89] Xue, Z., et al. “Genetic programs in human and mouse early embryos revealed by single-cell

RNA sequencing,” Nature, 500:593–7, Aug. 29, 2013.

[90] Gullapalli, R. R., et al. “Next generation sequencing in clinical medicine: Challenges and

lessons for pathology and biomedical informatics,” J. Pathol. Inform., 3:40, Oct. 31, 2012.

[91] Perkel, J. M. “Sequence analysis 101,” The Scientist, 25[3]:60–2, March 2011.

Page 29: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and

28 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing

[92] Mardis, E.R., “The $1,000 genome, the $100,000 analysis?” Genome Med, 2:84, 2010.

[93] System description, knomeDISCOVERY, from Knome, available at http://www.knome.com/

software-services/knomediscovery/; accessed Sept. 7, 2013.

[94] Perkel, J. M. “Streamlined engineering for synthetic biology,” Nature Methods, 10[1]:39–42,

January 2013.

[95] Collins, J. “Synthetic Biology: Bits and pieces come to life,” Nature, 483:S8–S10, Feb. 29, 2012.

[96] Chen, Y.-J., et al. “Characterization of 582 natural and synthetic terminators and

quantification of their design constraints,” Nature Methods, 10:659–64, July 2013.

[97] Mutalik, V. K., et al. “Quantitative estimation of activity and quality for collections of

functional genetic elements,” Nature Methods, 10:347–53, April 2013.

[98] R.C. Brewster, R. C., Jones, D. L., and Phillips, R. “Tuning promoter strength through RNA

polymerase binding site design in Escherichia coli,” PLoS Comput. Biol., 8[12]:e1002811,

2012.

[99] Gibson, D. G., et al. “Creation of a bacterial cell controlled by a chemically synthesized

genome,” Science, 329:52–6, July 2, 2010.

[100] Dymond, J. S., et al. “Synthetic chromosome arms function in yeast and generate

phenotypic diversity by design,” Nature, 477:471–6, Sept. 14, 2011.

[101] Perkel, J. M. “The new genetic engineering toolbox,” BioTechniques, 54[4]:185–8, April 2013.

[102] Product description, SB-728, from Sangamo BioSciences, available at http://www.sangamo.

com/pipeline/sb-728.html; accessed Sept. 8, 2013.

[103] Pennisi, E. “The tale of the TALEs,” Science, 338:1408–11, Dec. 14, 2012.

[104] Pennisi, E. “The CRISPR craze,” Science, 341:833–6, Aug. 23, 2013.

[105] Wang, H., et al., “One-step generation of mice carrying mutations in multiple genes by

CRISPR/Cas-mediated genome engineering,” Cell, 153:910–8, May 9, 2013.

[106] Mali, P., et al. “CAS9 transcriptional activators for target specificity screening and paired

nickases for cooperative genome engineering,” Nat. Biotechnol, 31:833-8, Sept. 2013 (or

doi:10.1038/nbt.2675, ePub ahead of print, Aug. 1, 2013).

[107] “Illumina management discusses Q1 2013 results – Earnings call transcript,” Seeking

Alpha, April 22, 2013, available at http://seekingalpha.com/article/1360091-illumina-

management-discusses-q1-2013-results-earnings-call-transcript; accessed Oct. 15, 2013.

[108] George Church, personal communication, June 24, 2013.

[109] Overview, Genome Technology Program, from National Human Genome Research

Institute, available at http://www.genome.gov/10000368; accessed Sept. 9, 2013.

[110] “MinION™ Access Programme,” undated web page, from Oxford Nanopore Technologies;

available at https://www.nanoporetech.com/technology/the-minion-device-a-

miniaturised-sensing-system/minion-access-programme.

Page 30: Genome Biology: Research and Clinical Perspectives€¦ · 2 Genome Biology: Research and Clinical Perspectives on the Next Generation of DNA Sequencing Genome Biology: Research and