a dna-based archival storage systembornholt/papers/dnastorage-asplos16.sli… · chunking data...

101
A DNA-Based Archival Storage System James Bornholt * Randolph Lopez * Douglas M. Carmean Luis Ceze * Georg Seelig * Karin Strauss * University of Washington MicrosoResearch

Upload: others

Post on 03-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

A DNA-Based Archival Storage SystemJames Bornholt* Randolph Lopez* Douglas M. Carmean† Luis Ceze* Georg Seelig* Karin Strauss†

* University of Washington † Microsoft Research

Page 2: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Facebook cold storage facility 1 exabyte (109 GB) 66,000 square feet

Page 3: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA molecules as storage

Page 4: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA molecules as storage

Extremely dense Theory: 1 exabyte in 1 in3

Page 5: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA molecules as storage

Extremely dense Theory: 1 exabyte in 1 in3

Extremely durable Half life > 500 years

Page 6: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Store

A DNA-based archival storage system

Write Read

Page 7: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Store

A DNA-based archival storage system

Write Read

Redundancy and density

Page 8: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Store

A DNA-based archival storage system

Write Read

Redundancy and density

Efficient retrieval

Page 9: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Store

A DNA-based archival storage system

Write Read

Redundancy and density

Efficient retrieval

Wet lab experiments

Page 10: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA manipulation

Page 11: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111
Page 12: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA moleculesFour nucleotides:

A

C

G

T

Adenine

Cytosine

Guanine

Thymine

Page 13: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA moleculesFour nucleotides:

A

C

G

T

Adenine

Cytosine

Guanine

Thymine

DNA strand (oligonucleotide) is a linear sequence of these nucleotides

G A C A C C TG A C A C C T

Page 14: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA moleculesFour nucleotides:

A

C

G

T

Adenine

Cytosine

Guanine

Thymine

DNA strand (oligonucleotide) is a linear sequence of these nucleotides

G A C A C C T

Two strands can bind to each other if they are complementary:

C T G T G G A

G A C A C C T

Page 15: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA moleculesFour nucleotides:

A

C

G

T

Adenine

Cytosine

Guanine

Thymine

DNA strand (oligonucleotide) is a linear sequence of these nucleotides

G A C A C C T

Two strands can bind to each other if they are complementary:

C T G T G G A

G A C A C C T

C T G T G G A

G A C A C C T

Page 16: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA moleculesFour nucleotides:

A

C

G

T

Adenine

Cytosine

Guanine

Thymine

DNA strand (oligonucleotide) is a linear sequence of these nucleotides

G A C A C C T

Two strands can bind to each other if they are complementary:

C T G T G G A

G A C A C C T

C T G T G G A

G A C A C C T

C, G are complementary

A, T are complementary

Page 17: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA moleculesFour nucleotides:

A

C

G

T

Adenine

Cytosine

Guanine

Thymine

DNA strand (oligonucleotide) is a linear sequence of these nucleotides

G A C A C C T

Two strands can bind to each other if they are complementary:

C T G T G G A

G A C A C C T

C T G T G G A

G A C A C C T

C, G are complementary

A, T are complementary

T

Partial errors allowed

Page 18: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA manipulationSynthesis: manufacturing DNA strands

GACACCT G A C A C C T

• Chemical synthesis process appends one nucleotide at a time • Maximum practical length ~200 nts • Typically produces thousands of copies of the strand

Page 19: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA manipulationSynthesis: manufacturing DNA strands

GACACCT G A C A C C T

• Chemical synthesis process appends one nucleotide at a time • Maximum practical length ~200 nts • Typically produces thousands of copies of the strand

Sequencing: reading DNA strandsGACACCTG A C A C C T

• Produces many reads of a strand • Much higher throughput than synthesis

Page 20: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

An archival storage system

Page 21: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

Page 22: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value)

Page 23: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

Page 24: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

Page 25: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

01001…

cat.jpg

Page 26: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

01001…

ACGAT…

cat.jpg

Page 27: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

01001…

ACGAT…

Synthesis

cat.jpg

Page 28: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

01001…

ACGAT…

Synthesis

cat.jpg

Page 29: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

01001…

ACGAT…

Synthesis

Pool

cat.jpg

Page 30: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

01001…

ACGAT…

Synthesis

Pool

cat.jpg

cat.jpg

Page 31: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

01001…

ACGAT…

Synthesis

Pool

cat.jpg

cat.jpg

cat.jpg

Page 32: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

01001…

ACGAT…

Synthesis

Pool

cat.jpg

cat.jpg

cat.jpg

Page 33: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

01001…

ACGAT…

Synthesis

Pool

cat.jpg

cat.jpg

cat.jpg

Sequencing

Page 34: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

01001…

ACGAT…

Synthesis

Pool

cat.jpg

cat.jpg

cat.jpg

Sequencing

Page 35: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

01001…

ACGAT…

Synthesis

Pool

cat.jpg

cat.jpg

cat.jpg

Sequencing

Organizing written data

Page 36: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

01001…

ACGAT…

Synthesis

Pool

cat.jpg

cat.jpg

cat.jpg

Sequencing

Organizing written data

Efficient retrieval

Page 37: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

System overviewArchival storage system structured as a key-value store

put(key, value) get(key)

01001…

ACGAT…

Synthesis

Pool

cat.jpg

cat.jpg

cat.jpg

Sequencing

Organizing written data

Efficient retrieval

Coping with errors

Page 38: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Writing data to DNAThe easy way: convert base 2 to base 4

10100011 10010001 11100111 11000101 10010100 10111101

Page 39: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Writing data to DNAThe easy way: convert base 2 to base 4

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

Page 40: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Writing data to DNAThe easy way: convert base 2 to base 4

G C A C

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T T G C T T A C C G C C A G T T C

Page 41: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Writing data to DNAThe easy way: convert base 2 to base 4

But this approach isn’t feasible for more than a few bytes

G C A C

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T T G C T T A C C G C C A G T T C

Page 42: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Writing data to DNAThe easy way: convert base 2 to base 4

G

But this approach isn’t feasible for more than a few bytes

G C A C

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T T G C T T A C C G C C A G T T C

Page 43: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Writing data to DNAThe easy way: convert base 2 to base 4

G G

But this approach isn’t feasible for more than a few bytes

G C A C

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T T G C T T A C C G C C A G T T C

Page 44: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Writing data to DNAThe easy way: convert base 2 to base 4

G G

But this approach isn’t feasible for more than a few bytes

P[Attach] = 99%

G C A C

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T T G C T T A C C G C C A G T T C

Page 45: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Writing data to DNAThe easy way: convert base 2 to base 4

G G

But this approach isn’t feasible for more than a few bytes

P[Attach] = 99%

99%

G C A C

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T T G C T T A C C G C C A G T T C

Page 46: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Writing data to DNAThe easy way: convert base 2 to base 4

G G A

But this approach isn’t feasible for more than a few bytes

P[Attach] = 99%

99% 98%

G C A C

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T T G C T T A C C G C C A G T T C

Page 47: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Writing data to DNAThe easy way: convert base 2 to base 4

G G A T G C A

But this approach isn’t feasible for more than a few bytes

P[Attach] = 99%

99% 98% 97% 96.1% 95.1% 94.2%

G C A C

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T T G C T T A C C G C C A G T T C

Page 48: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Writing data to DNAThe easy way: convert base 2 to base 4

G G A T G C A

But this approach isn’t feasible for more than a few bytes

P[Attach] = 99%

99% 98% 97% 96.1% 95.1% 94.2%

100 nts

36.6%

G C A C

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T T G C T T A C C G C C A G T T C

Page 49: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Writing data to DNAThe easy way: convert base 2 to base 4

G G A T G C A

But this approach isn’t feasible for more than a few bytes

P[Attach] = 99%

99% 98% 97% 96.1% 95.1% 94.2%

100 nts

36.6%

200 nts

13.4%

… …

G C A C

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T T G C T T A C C G C C A G T T C

Page 50: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

G C A C

Chunking dataBreak binary data into chunks stored in separate strands

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T T G C T T A C C G C C A G T T C

Page 51: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

G C A C

Chunking dataBreak binary data into chunks stored in separate strands

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T

T G C T T A C C

G C C A G T T C

Page 52: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

G C A C

Chunking dataBreak binary data into chunks stored in separate strands

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T

T G C T T A C C

G C C A G T T C

A A A A

A A A C

A A A G

Addresses within the value

Page 53: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

C A T C C

G C A C

Chunking dataBreak binary data into chunks stored in separate strands

10100011 10010001 11100111 11000101 10010100 10111101

2 2 0 3 2 1 0 1 3 2 1 3 3 0 1 1 2 1 1 0 2 3 3 1

G G A T

T G C T T A C C

G C C A G T T C

A A A A

A A A C

A A A G

C A T C C

C A T C C

A T G T T

A T G T T

A T G T T

Addresses within the value

Key identifiers (“primers”)

Page 54: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

C A T C C G C C A G T T C A A A G A T G T T

Efficient reads

G C A C G G A T

T G C T T A C C

A A A A

A A A C

C A T C C

C A T C C

A T G T T

A T G T T

Addresses within the value

Key identifiers (“primers”)

Page 55: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

A T T TG CCTACGAAACTTGACCG

Efficient reads

A T T TG CCTACAAAACACGTAGG

A T T TG CCTACCAAACCATTCGT

Addresses within the value

Key identifiers (“primers”)

Page 56: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

A T T TG CCTACGAAACTTGACCG

Efficient reads

A T T TG CCTACAAAACACGTAGG

A T T TG CCTACCAAACCATTCGT

Pool containing stored strands for all keys & values!

Addresses within the value

Key identifiers (“primers”)

Page 57: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

A T T TG CCTACGAAACTTGACCG

Efficient reads

A T T TG CCTACAAAACACGTAGG

A T T TG CCTACCAAACCATTCGT

Pool containing stored strands for all keys & values! get(key)

cat.jpg

Addresses within the value

Key identifiers (“primers”)

Page 58: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

A T T TG CCTACGAAACTTGACCG

Random access

A T T TG CCTACAAAACACGTAGG

A T T TG CCTACCAAACCATTCGT

AddressPrimers

Page 59: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

A T T TG CCTACGAAACTTGACCG

Random access

A T T TG CCTACAAAACACGTAGG

A T T TG CCTACCAAACCATTCGT

AddressPrimers

Strands with 3 different primers

Page 60: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

A T T TG CCTACGAAACTTGACCG

Random access

A T T TG CCTACAAAACACGTAGG

A T T TG CCTACCAAACCATTCGT

AddressPrimers

PCR

Selectively amplify strands based on their primer

Strands with 3 different primers

Page 61: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

A T T TG CCTACGAAACTTGACCG

Random access

A T T TG CCTACAAAACACGTAGG

A T T TG CCTACCAAACCATTCGT

AddressPrimers

PCR

Selectively amplify strands based on their primer

Strands with 3 different primers

Page 62: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

A T T TG CCTACGAAACTTGACCG

Random access

A T T TG CCTACAAAACACGTAGG

A T T TG CCTACCAAACCATTCGT

AddressPrimers

PCR

Selectively amplify strands based on their primer

SampleStrands with 3 different primers

Almost all strands have desired primer

Page 63: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

A T T TG CCTACGAAACTTGACCG

Random access

A T T TG CCTACAAAACACGTAGG

A T T TG CCTACCAAACCATTCGT

AddressPrimers

PCR

Selectively amplify strands based on their primer

SampleStrands with 3 different primers

Almost all strands have desired primer

Reads are destructive, so replenish when necessary

Page 64: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Error correctionBoth synthesis and sequencing are error prone:

G G A T A G C

G G A T G C A

G G A T G A

G G A T C C A

Insertions

Deletions

Substitutions

A

Error rates ~1% per nucleotide!

Page 65: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Logical redundancy

Page 66: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Logical redundancy

AddressPrimer Data

Page 67: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Logical redundancy

AddressPrimer Data

Page 68: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Logical redundancy

AddressPrimer Data

Page 69: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Logical redundancy

AddressPrimer Data

XOR redundancy provides simple error correction

Page 70: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Logical redundancy

AddressPrimer Data

XOR redundancy provides simple error correction

Reserved address space to indicate redundancy data

Page 71: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Wet lab results

Page 72: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The process

Page 73: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The process

Page 74: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The process

Page 75: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The process

catcatgg

Page 76: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The process

catcatgg

Page 77: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The process

catcatgg

Page 78: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The process

catcatgg

Page 79: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The process

catcatgg

Page 80: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The process

catcatgg

catcatgc

Page 81: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The process

catcatgg

catcatgc

Page 82: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The process

catcatgg

catcatgc

Throughput MBs/week

Page 83: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DecodingEncoded and synthesized 3 files (151 kB):

Page 84: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Photo: Tara Brown / UW

Page 85: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DecodingEncoded and synthesized 3 files (151 kB):

Page 86: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DecodingEncoded and synthesized 3 files (151 kB):

Selected and PCRed one file for random access (42 kB):

Page 87: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DecodingEncoded and synthesized 3 files (151 kB):

Selected and PCRed one file for random access (42 kB):

Sequenced and decoded the resulting amplified pool:

Page 88: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DecodingEncoded and synthesized 3 files (151 kB):

Selected and PCRed one file for random access (42 kB):

Sequenced and decoded the resulting amplified pool:

Recovered every bit despite errors in synthesis and sequencing

Page 89: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The importance of redundancy

AddressPrimer Data

Page 90: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The importance of redundancy

AddressPrimer Data

Page 91: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The importance of redundancy

AddressPrimer Data

0

25

50

75

0 2500 5000 7500Number of copies

Freq

uenc

y

If we ignore redundancy data, we cannot recover the file.

Page 92: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

The importance of redundancy

AddressPrimer Data

0

25

50

75

0 2500 5000 7500Number of copies

Freq

uenc

y Some strands are missing entirely

If we ignore redundancy data, we cannot recover the file.

Page 93: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Store

A DNA-based archival storage system

Write Read

Redundancy and density

Efficient retrieval

Wet lab experiments

Page 94: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Store

A DNA-based archival storage system

Write Read

Redundancy and density

Efficient retrieval

Wet lab experiments

Also in the paper: • Reliability-density

trade-off • Simulation of decay

over time • Error analysis • Model of truncated

strands

Page 95: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111
Page 96: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

MBs/week GBs/second

Page 97: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA productivity is growing

102

104

106

108

1010

1970 1980 1990 2000 2010Year

Prod

uctiv

ity

Transistors on ChipReading DNAWriting DNA

Source: Robert Carlson

Page 98: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

DNA technology is miniaturizing

Page 99: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

We’ve just barely scratched the surface

0%

25%

50%

75%

100%

0.01% 0.1% 1% 10%Reads used

Accu

racy

Page 100: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Our community has seen these challenges before

Simulation

Scheduling

Spatial addressing

Programming with errors

Error correction

Latency-hiding optimizations

Cache locality

Circuit design

Page 101: A DNA-Based Archival Storage Systembornholt/papers/dnastorage-asplos16.sli… · Chunking data Break binary data into chunks stored in separate strands 10100011 10010001 11100111

Photo: Tara Brown / UW