david goodsell. gtl workshop b: experimental technology development and integration tue at 2 pm...

28

Post on 15-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham
Page 2: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM

Co-Chairs – George Church, Harvard Medical SchoolHam Smith, Institute for Biological Energy Alternatives

As we attempt to understand, protect, and/or engineer environmental microbial communities, we need to ask what sorts of data would most benefit our models and how to obtain these cost-effectively. For this session let us answer what small (or large) technological step are we taking toward these specific challenges

The framework for the discussions will be the following questions:What are the most useful technologies for our tasks/goals now and for the future? What are the major technological gaps that will need to be addressed to reach the GTL goals? To what extent will the technologies be developed by others?How can technologies best be used to complement each other and strengthen the resulting research/insights? How do we promote the kind of synergistic interactions among the practitioners?

We would like to invite you to bring one viewgraph to share with the participants on your views about technologies needed to meet these challenges.

Page 3: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM

Specific challenges: (1) Microscopic methods capable of tracing the chain of a small genome?(2) Quantitation of “all” peptide states (either in single cells or populations)? (3) Sequencing at Mbp per $?(4) Automated designed genome engineering?

Discussions leaders:(1) Joachim Frank (Wadsworth Center, NY Dept of Health) on Cryo-Electron Microscopy(1) Hoi-Ying Holman (Berkeley Lab) on FTIR imaging(1) Steve Colson (PNNL) on optical imaging(2) Bob Hettich or Greg Hurst (ORNL) and Dick Smith (PNNL) on Mass spectrometry, (3) George Church (HMS) Polony sequencing(4) Ham Smith (IBEA) Genome Synthesis

We would like to invite you to bring one viewgraph to share with the participants on your views about technologies needed to meet these challenges.

Page 4: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

DNA RNA Proteins

Metabolites

Replication rate

Environment

Biosystems Integrating Measures & Models

Microbes Cancer & stem cells Darwinian optimaIn vitro replicationSmall multicellular organisms

RNAiInsertionsSNPs

interactions

Page 5: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Improving Models & Measures

Why model?

“Killer Applications”: Share, Search, Merge, Check, Design

Page 6: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

The issue is not speed, but integration.Cost per 99.99% bp : Including Reagents, Personnel, Equipment/5yr, Overhead/sq.m• Sub-mm scale : 1m = femtoliter (10-15)• Instruments $2-50K per CPU

Why improve measurements?

Human genomes (6 billion)2 = 1019 bpImmune & cancer genome changes >1010 bp per time pointRNA ends & splicing: in situ 1012 bits/mm3

Biodiversity: Environmental & lab evolution Compact storage 105 now to 1017 bits/ mm3 eventually

& How? ($1K per genome, 108-1013 bits/$ )

Page 7: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Projected costs determine when biosystems data overdetermination is feasible.

In 1984, pre-HGP (X, pBR322, etc.) 0.1bp/$, would have been $30B per human

genome.

In 2002, (de novo full vs. resequencing ) ABI/Perlegen/Lynx: $300M vs. $3M

103 bp/$ (4 log improvement)

Other data I/O (e.g. video) 1013 bits/$

Page 8: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Why single molecules?

Integration from cells/genomes/RNAs to data

Geometric constraints :Who’s “in cis” on a molecule, complex, or cell.e.g. DNA Haplotypes & RNA splice-forms

Page 9: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Polymerasecolonies

(Polonies) along a DNA

or RNAmolecule

Page 10: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

A’

A’A’

A’

A’

A’

B

BB

B

BB

A

Single Molecule From Library

B

BA’

A’

1st Round of PCR

Primer is Extendedby Polymerase

B

A’

BA’

Polymerase colony (polony) PCR in a gel

Primer A has 5’ immobilizing Acrydite

Mitra & Church Nucleic Acids Res. 27: e34

Page 11: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

• Hybridize Universal Primer • Add Red (Cy3) dTTP. Wash.• Add Green (FITC) dCTP• Wash; Scan

B B’

3’ 5’

AGT.

TC

B B’

3’ 5’

GCG..

C

Sequence polonies by sequential, fluorescent single-base extensions

Page 12: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

$1K per diploid human sequence

Input: Buccal cells, blood, or forensic samples. Output: Prioritized list of deviant bps (e.g. non-conservative).

Raw data rate: 16 pixels/bp, 1Mpixel per 6sec/CPU = 24 CPU days. Amortization: 5 yr for camera/CPU/transport @ $50K total = $200 per 1011 bp Overhead: $200 /sq ft/yr * 40 sq.ft (400 cu.ft) = $40Reagents: At 20 m per (5 m) polony and 40 bp reads means 10000 cm2 area, 800 ml of fluor dNTP, $100/mg = $40 5 ml PCR reactions = $200Disposables: 500 slides = $50 Electricity: 2 kwatts 24hr*24days* 0.13$/kwatt-hr = $150Labor for repair: 10% of instrument cost = $10 Labor for operation: Slide PCR, slide dips, scans, etc. = $20R&D: Initially NIH grants (roughly 10%).

Page 13: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Inexpensive, off-the-shelf equipment

MJR in situ Cycler$10K

Automatedslide fluidics

$4K

                                                                                 

MicroarrayScanner$26K+

Page 14: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Human Haplotype:CFTR gene

45 kbp

Rob MitraVincent ButtyJay ShendureBen Williams

Page 15: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Quantitative removal of Fluorophores

Rob Mitra

Page 16: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Template ST30:3' TCACGAGT

Base added: (C) A G T (C)

(A) G (T) C (A)

(G) T C A

3' TCACGAGT AGTGCTCA

Sequencing multiple polonies

Rob Mitra

Page 17: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Mutiple Image Alignment

Metric based on optimal coincidence of high intensity noise pixels over a matrix of local offsets (0.4 pixel precision)

Shendure

Page 18: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Polony exclusion principle &Single pixel sequences

Mitra & Shendure

Page 19: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Polony Flavors

1. Replica Plating of DNA images [Mitra et al. NAR 1999]

2. Long Range Haplotyping [Mitra et al. PNAS 2003]

3. Allelic mRNA Quantitation (HEP) [Mitra et al. 2003]

4. Alternative Splicing Combinatorics [Zhu et al. 2003]

5. Precise SNP-mutant & mRNA ratios [Merrill et al. 2003]

6. Fluor in situ Sequencing (FISSEQ 1) [Mitra et al. 2003]

7. Multiplex Genotyping (ApoE, Hyman, Shendure & Williams)

8. In situ / single-cell extensions of the above (Zhu & Williams)

Page 20: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Synthetic Mini-genomes• 90kbp genome? All 3D structures known.• Comprehensive functional data too.• 100X faster replication (10 sec doubling) & selection to evolve widgets & systems?• Utility of mirror-image & other unnatural polymers.• Chassis & power supply

Page 21: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

A 90 kbp mini-genomeSP (3D) StochimetryMge# Bp Min access# Gene L.end R.endorientationlen2 SequenceTotal 144 107 89,498 74,310 285316S 1 y 1418 1418 3968 rrsB 4164238 4165779 > 124 aaattgaagagtttgatcatggctcagattgaacgctggcggcaggcctaacacatgcaagtcgaacggtaacaggaagaagcttgcttctttgctgacgagtggcggacgggtgagtaatgtctgggaaactgcctgatggagggggataactactggaaacggtagctaataccgcataacgtcgcaagaccaaagagggggaccttcgggcctcttgccatcggatgtgcccagatgggattagctagtagg23S 1 y 2903 2903 3970 rrlB 4166220 4169123 > 1 ggttaagcgactaagcgtacacggtggatgccctggcagtcagaggcgatgaaggacgtgctaatctgcgataagcgtcggtaaggtgatatgaaccgttataaccggcgatttccgaatggggaaacccagtgtgtttcgacacactatcattaactgaatccataggttaatgaggcgaaccgggggaactgaaacatctaagtaccccgaggaaaagaaatcaaccgagattcccccagtagcggcgagcga5S 1 120 120 3971 rrfB 4169216 4169335 > 0 tgcctggcggcagtagcgcggtggtcccacctgaccccatgccgaactcagaagtgaaacgccgtagcgccgatggtagtgtggggtctccccatgcgagagtagggaactgccaggcat10sb (RNaseP) 375 375 3123 rnpB 3268233 3267857 < 2 gaagctgaccagacagtcgccgcttcgtcgtcgtcctcttcgggggagacgggcggaggggaggaaagtccgggctccatagggcagggtgccaggtaacgcctgggggggaaacccacgaccagtgcaacagagagcaaaccgccgatggcccgcgcaagcgggatcaggtaagggtgaaagggtgcggtaagagcgcaccgcgcggctggtaacagtccgtggcacggtaaactccacccggagcaaggccaatRNAs 20-46 y 3136 1364 3939 eg. gltT 4165951 4166026 > gtccccttcgtctagaggcccaggacaccgccctttcacggcggtaacaggggttcgaatcccctaggggacgccaCca (no) ? 1236 3056 cca 3199532 3200770 > 3 gtgaagatttatctggtcggtggtgctgttcgggatgcattgttagggctaccggtcaaagacagagattgggtggtggtcggcagtacgccacaggagatgctcgacgcgggctaccagcaggtaggccgcgattttcctgtgtttctgcatccgcaaacgcatgaagagtatgcgctggcacgtaccgaacggaaatccggttccggttacaccggttttacttgctatgccgcaccggatgtcacgctggaaTrmA (22?) ? 1098 3965 trmA 4159749 4160849 < 3 atgacccccgaacaccttccaacagaacagtatgaagcgcagttagccgaaaaagtggtacgtttgcaaagtatgatggcaccgttttctgacctggttccggaagtgtttcgctcgccggtcagtcattaccggatgcgcgcggagttccgcatctggcacgatggcgatgacctgtatcacatcattttcgatcaacaaaccaaaagccgcatccgcgtggatagcttccccgccgccagtgaacttatcaacBstNBI (no) 1815 AF329098 1 1815 > 0 atggctaaaaaagttaattggtatgtttcttgttcacctagaagtccagaaaaaattcagcctgagttaaaagtactagcaaattttgagggaagttattggaaaggggtaaaagggtataaagcacaagaggcatttgctaaagaacttgctgctttaccacaattcttaggtactacttataaaaaagaagctgcattttctactcgagacagagtggcaccaatgaaaacttatggtttcgtatttgtagatTri1 ? AP001918 traI 92673 97943 > atgatgagtattgcgcaggtcagatcggccggaagtgccgggaactattataccgacaaggataattactatgtgctgggcagcatgggagaacgctgggccggcaggggggctgaacagctggggctgcagggcagtgtcgataaggatgtttttacccgtcttctggagggcaggctgccggacggagcggatctaagccgcatgcaggatggcagtaacaggcatcgtcccggctacgatctgaccttctccFlp no 1272 NC_001398 5573 523 > 0 atgccacaatttggtatattatgtaaaacaccacctaaggtgcttgttcgtcagtttgtggaaaggtttgaaagaccttcaggtgagaaaatagcattatgtgctgctgaactaacctatttatgttggatgattacacataacggaacagcaatcaagagagccacattcatgagctataatactatcataagcaattcgctgagtttcgatattgtcaataaatcactccagtttaaatacaagacgcaaaaaGFP no 717 AF302837 27 743 > 0 atgagtaaaggagaagaacttttcactggagttgtcccaattcttgttgaattagatggcgatgttaatgggcaaaaattctctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactgggaagctacctgttccatggccaacacttgtcactactttcgcgtatggtcttcaatgctttgcgagatacccagatcatatgaaacagcatgactttttcaagRnpa (36%) 357 357 3704 rnpA 3882122 3882481 > 3 gtggttaagctcgcatttcccagggagttacgcttgttaactcccagtcaattcacattcgtcttccagcagccacaacgggctggcacgccgcaaattaccattctcggccgcctgaattcgctggggcatccccgtatcggtcttacagtcgccaagaaaaacgttcgacgcgcccatgaacgcaatcggattaaacgtctgacgcgtgaaagcttccgtctgcgccaacatgaactcccggctatggatttcBstPol multiprot 2631 2631 U93028 95 2728 > 3 atgagattgaagaaaaaactcgtcttaattgatggcaacagtgtggcataccgcgccttttttgccttgccacttttgcataacgacaaaggcattcatacgaatgcggtttacgggtttacgatgatgttgaacaaaattttggcggaagaacaaccgacccatttacttgtagcgtttgacgccggaaaaacgacgttccggcatgaaacgtttcaagagtataaaggcggacggcaacaaacgcccccggaaRpol_Bpt7 multiprot 2649 2649 NC_001604 3171 5822 > 2 atgaacacgattaacatcgctaagaacgacttctctgacatcgaactggctgctatcccgttcaacactctggctgaccattacggtgagcgtttagctcgcgaacagttggcccttgagcatgagtcttacgagatgggtgaagcacgcttccgcaagatgtttgagcgtcaacttaaagctggtgaggttgcggataacgctgccgccaagcctctcatcactaccctactccctaagatgattgcacgcatcEFTu 451 1179 1179 3339 tufA 3467782 3468966 < 6 gtgtctaaagaaaaatttgaacgtacaaaaccgcacgttaacgttggtactatcggccacgttgaccacggtaaaactactctgaccgctgcaatcaccaccgtactggctaaaacctacggcggtgctgctcgtgcattcgaccagatcgataacgcgccggaagaaaaagctcgtggtatcaccatcaacacttctcacgttgaatacgacaccccgacccgtcactacgcacacgtagactgcccggggcacEFG (59%) 89 2109 2109 3340 fusA 3469037 3471151 < 6 atggctcgtacaacacccatcgcacgctaccgtaacatcggtatcagtgcgcacatcgacgccggtaaaaccactactaccgaacgtattctgttctacaccggtgtaaaccataaaatcggtgaagttcatgacggcgctgcaaccatggactggatggagcaggagcaggaacgtggtattaccatcacttccgctgcgactactgcattctggtctggtatggctaagcagtatgagccgcatcgcatcaacEFTs 433 846 846 170 tsf 190857 191708 > 6 atggctgaaattaccgcatccctggtaaaagagctgcgtgagcgtactggcgcaggcatgatggattgcaaaaaagcactgactgaagctaacggcgacatcgagctggcaatcgaaaacatgcgtaagtccggtgctattaaagcagcgaaaaaagcaggcaacgttgctgctgacggcgtgatcaaaaccaaaatcgacggcaactacggcatcattctggaagttaactgccagactgacttcgttgcaaaaEFP (no) 26 561 561 4147 efp 4373277 4373843 > 6 atggcaacgtactatagcaacgattttcgtgctggtcttaaaatcatgttagacggcgaaccttacgcggttgaagcgagtgaattcgtaaaaccgggtaaaggccaggcatttgctcgcgttaaactgcgtcgtctgctgaccggtactcgcgtagaaaaaaccttcaaatctactgattccgctgaaggcgctgatgttgtcgatatgaacctgacttacctgtacaacgacggtgagttctggcacttcatgIF1 173 213 213 884 infA 925448 925666 < 6 atggccaaagaagacaatattgaaatgcaaggtaccgttcttgaaacgttgcctaataccatgttccgcgtagagttagaaaacggtcacgtggttactgcacacatctccggtaaaatgcgcaaaaactacatccgcatcctgacgggcgacaaagtgactgttgaactgaccccgtacgacctgagcaaaggccgcattgtcttccgtagtcgctgaIF2 (25%) 142 2682 2682 3168 infB 3310983 3313655 < -9 atgacagatgtaacgattaaaacgctggccgcagagcgacagacctccgtggaacgcctggtacagcaatttgctgatgcaggtatccggaagtctgctgacgactctgtgtctgcacaagagaaacagactttgattgaccacctgaatcagaaaaattcaggcccggacaaattgacgctgcaacgtaaaacacgcagcacccttaacattcctggtaccggtggaaaaagcaaatcggtacaaatcgaagtcIF3 (~50%) 196 540 540 1718 infC 1798120 1798662 < 3 attaaaggcggaaaacgagttcaaacggcgcgccctaaccgtatcaatggcgaaattcgcgcccaggaagttcgcttaacaggtctggaaggcgagcagcttggtattgtgagtctgagagaagctctggagaaagcagaagaagccggagtagacttagtcgagatcagccctaacgccgagccgccggtttgtcgtataatggattacggcaaattcctctatgaaaagagcaagtcttctaaggaacagaagRF1 (no) 258 1080 1211 prfA 1264235 1265317 > 3 atgaagccttctatcgttgccaaactggaagccctgcatgaacgccatgaagaagttcaggcgttgctgggtgacgcgcaaactatcgccgaccaggaacgttttcgcgcattatcacgcgaatatgcgcagttaagtgatgtttcgcgctgttttaccgactggcaacaggttcaggaagatatcgaaaccgcacagatgatgctcgatgatcctgaaatgcgtgagatggcgcaggatgaactgcgcgaagctRRF 435 555 555 172 frr 192872 193429 > 3 gtgattagcgatatcagaaaagatgctgaagtacgcatggacaaatgcgtagaagcgttcaaaacccaaatcagcaaaatacgcacgggtcgtgcttctcccagcctgctggatggcattgtcgtggaatattacggcacgccgacgccgctgcgtcagctggcaagcgtaacggtagaagattcccgtacactgaaaatcaacgtgtttgatcgttcaatgtctccggccgttgaaaaagcgattatggcgtccRL1 (~50%) 1 82 699 699 3984 rplA 4176457 4177161 > 6 atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacagtacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaaagcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgtggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaaggtgcaaacgctgaaRL2 1 154 816 816 3317 rplB 3448180 3449001 < 6 atggcagttgttaaatgtaaaccgacatctccgggtcgtcgccacgtagttaaagtggttaaccctgagctgcacaagggcaaaccttttgctccgttgctggaaaaaaacagcaaatccggtggtcgtaacaacaatggccgtatcaccactcgtcatatcggtggtggccacaagcaggcttaccgtattgttgacttcaaacgcaacaaagacggtatcccggcagttgttgaacgtcttgagtacgatccg

Page 22: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

The in vitro assembly (& 3D structure) of the prokaryotic ribosomes is known. (e.g. Nomura et al.; Noller et al.)

Page 23: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

M 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

DNA Template

RNA Transcript

All 30S-Ribosomal-protein DNAs & mRNAs synthesized in vitro

Tian & Church

Page 24: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

His-tagged ribosomal proteins synthesized in vitro

RS-2,4,5,6,9,10,12,13,15,16,17,and 21 as original constructs.

RS1 required deletion of a feedback motif in the mRNA.RS-3, 7, 8, 11, 14, 18, 19, 20 are still weakly expressed.

Note that S1, S4, S7, S8, S20, L1, L4, L10 are known to repress their own translation (and are likely titrated by rRNA).

Tian & Church

Page 25: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham
Page 26: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Set o

f N

coor

dina

tes

x y z

Matrix ofdistances

SVD(singularvaluedecomposition)

Euclidean Metric

pdb file (viewed with RasMol)

Matlab visualization

Representations of the Chromosome

Page 27: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Bidirectionalreplication Paired fork

Page 28: David Goodsell. GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham

Origin

Blue: Left replicated segment (yelgr=high gene#)Red: Right (i.e. middle) segmentAqua: unduplicated segment of the circular genome

Avoidance of entanglement throughout cell cycle