low-cost/high-accuracy microbial genome synthesis and monitoring
DESCRIPTION
Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring. 1-Feb-2005 9:15-10 MITRE. Thanks to: DARPA & DOE-GtL Agencourt , Ambergen, Atactic , BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen , Xeotron/Invitrogen For more info see: arep.med.harvard.edu. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/1.jpg)
Thanks to: DARPA & DOE-GtL
Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, Xeotron/Invitrogen
For more info see: arep.med.harvard.edu
1-Feb-2005 9:15-10 MITRE
Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring
![Page 2: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/2.jpg)
Synthetic - homologous recombination
testing of DNA motifs
1.3 2.4 (1.3 in argR)
1.1 1.3
0.7 2.5
0.2 1.4
1.4 3.5
RNA Ratio (motif- to wild type) for each flanking gene
Bulyk, McGuire,Masuda,Church Genome Res. 14:201–208
![Page 3: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/3.jpg)
Synthetic Genomes & Proteomes. Why?
• Test or engineer cis-DNA/RNA-elements •Access to any protein (complex) including post-transcriptional modifications• Affinity agents for the above.• Protein design, vaccines, solubility screens • Utility of molecular biology DNA -- RNA -- Protein
in vitro "kits" (e.g. PCR -- T7 -- Roche)
Toward these goals design a chassis:• 115 kbp genome. 150 genes.• Nearly all 3D structures known.• Comprehensive functional data.
![Page 4: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/4.jpg)
(PURE) translation utility
Removing tRNA-synthetases, translational release-factors,RNases & proteases
Selection of scFvs[antibodies] specific for HBV DNA polymerase using ribosome display. Lee et al. 2004 J Immunol Methods. 284:147
Programming peptidomimetic syntheses by translating genetic codes designed de novo. Forster et al. 2003 PNAS 100:6353
High level cell-free expression & specific labeling of integral membrane proteins. Klammt et al. 2004 Eur J Biochem 271:568
Cell-free translation reconstituted with purified components. Shimizu et al. 2001 Nat Biotechnol. 19:751-5.
![Page 5: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/5.jpg)
in vitro genetic codes
5'
mS yU eU
UGGUUG CAG
AAC... GUU A 3'GAAACCAUG
fM TN V E
| | | | | || | |
5' Second base 3'
U
A
C
C U
mSyU
eU
A C U
G
A
0
500
1000
1500
2000
2500
3000
3500
30 40 50 60 70 80
3H-E dpm
time (min.)
fM yU mS eU E |
Forster, et al. (2003) PNAS 100:6353-7
80% average yieldper unnatural coupling. eU = 2-amino-4-pentenoic acid
yU = 2-amino-4-pentynoic acid mS = O-methylserine gS = O-GlcNAc–serine bK = biotinyl-lysine
![Page 6: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/6.jpg)
Escherichia coli Mycoplasma 3D structureColiphage 29 DNA polymerase + +Coliphage P1 Cre recombinase - + >Coliphage Lox/Cre recombinase site - +Coliphage T7 RNA polymerase + + >Coliphage T7 RNA polymerase initiation site + + >Coliphage T7 RNA polymerase termination site + +RNase P RNA + -RNase P protein + + >RNase P site/RNA primer for DNA polymerase + +Small subunit 16S ribosomal RNA + +All 21 small subunit ribosomal proteins (1-21) + except 1,21 +Large subunit 5S ribosomal RNA + +Large subunit 23S ribosomal RNA + +Large subunit 23S rRNA G2445>m2G methylase: unknown ? -Large subunit 23S rRNA U2449>dihydroU synthetase: unknown ? -Large subunit 23S rRNA U2457>pseudoU synthetase ? -Large subunit 23S rRNA C2498>Cm methylase: unknown ? -Large subunit 23S rRNA A2503>m2A methylase: unknown ? -Large subunit 23S rRNA U2504>pseudoU synthetase ? -All 33 large subunit ribosomal proteins (1-7,9-11,13-25,27-36) + except 25, 30 +Translational initiation factor 1 + +Translational initiation factor 2 + +Translational initiation factor 3 + +Translational elongation factor Tu + +Translational elongation factor Ts + +Translational elongation factor G + +Translational release factor 1 + +Translational release factor 2 - +Translational release factor Gln methylase + +Translational release factor 3 - +Ribosome recycling factor + +33/45 Transfer RNAs (see Fig. 2) 29/33 +tRNA(I) C34>lysidine synthetase ? +tRNA(R) A34>I deaminase ? +tRNA(ASV) U34>cmo5U (=V) synthetase: unknown - -tRNA(R) U34>2sU Cys desulfurase - +tRNA(R) nm5U34 methylase ? +tRNA(R) U34>cmnm5U GTPase ? +tRNA(R) U34>cmnm5U synthetase ? +tRNA(R) cmnm5U34>nm5U,mnm5U synthetase ? -tRNA(R) G37 N1-methylase + +tRNA(RNIKM) A37>t6A N6-threonylcarbamoyl-A synthetase: unknown + -tRNA(CLFSWY) A37>i6A synthetase - +tRNA(CLFSWY) i6A37>s2i6A(ms2i6A) synthetase - +All 22 aminoacyl-tRNA synthetase subunits (20 enzymes) + except G subunit, Q + except G subunitMet-tRNA formyltransferase + +Chaperonin DnaK + +Chaperonin GroEL + +Chaperonin GroES + +
Total genes = 150Forster & Church
Oligos for 150 & 776
synthetic genes(for E.coli minigenome & M.mobile whole genome
respectively)
![Page 7: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/7.jpg)
Up to 760K Oligos/Chip18 Mbp for $700 raw (6-18K genes)
<1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid Sheng , Zhou, Gulari, Gao (U.Houston) 24K Agilent Ink-jet standard reagents 48K Febit 100K Metrigen 380K Nimblegen Photolabile 5'protection Nuwaysir, Smith, Albert
Tian, Gong, Church
![Page 8: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/8.jpg)
Improve DNA Synthesis CostSynthesis on chips in pools is 5000X less expensive per
oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!)
Solution: Amplify the oligos then release them.
10 50 10 => ss-70-mer (chip)
20-mer PCR primers with restriction sites at the 50mer junctions
Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
=> ds-90-mer
=> ds-50-mer
![Page 9: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/9.jpg)
Improve DNA Synthesis Accuracyvia mismatch selection
Tian & Church Other mismatch methods: MutS (&H,L)
![Page 10: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/10.jpg)
Genome assembly
Moving forward: 1. Tandem, inverted and dispersed repeats (hierarchical assembly, size-selection and/or scaffolding)2. Reduce mutations (goal <1e-6 errors) to reduce # of intermediates 3. 15kb to 5Mb by homologous recombination (Nick Reppas)4. Phage integrase site-specific recombination, also for counters.
Stemmer et al. 1995. Gene 164:49-53;Mullis 1986 CSHSQB.
50
75
125 225 425 825 … 100*2^(n-1)
![Page 11: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/11.jpg)
All 30S-Ribosomal-protein DNAs(codon re-optimized)
Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
1.7 kb
0.3 kb
s190.3kb
Nimblegen 95K chip
Atactic <4K chip
![Page 12: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/12.jpg)
Improving synthesis accuracy
Method Bp/error
Chip assembly only 160 Hybridization-selection 1,400MutS-gel-shift 10,000MutHLS cleavage 30,000 (10X better than PCR)
Tian & Church 2004Carr & Jacobson 2004Smith & Modrich 1997
![Page 13: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/13.jpg)
Extreme mRNA makeover for protein expression in vitro
RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially.
RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable.
Solution: Iteratively resynthesize all mRNAs with less mRNA structure.
Tian & Church
20w 20m 17w 17m 16w 16m
10kd
W: wild-typeM: modified
Western blot based on His-tags
![Page 14: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/14.jpg)
Safety Proposals
Church, G.M. A synthetic biohazard non-proliferation proposal. http://arep.med.harvard.edu/SBP/Church_Biohazard04c.doc (2004)
1. Monitor oligo synthesis via expansion of Controlled substances, Select Agents, &/or Recombinant DNA
2. Computational tools for the above
3. System modeling checks for synthetic biology projects
4. Multi-auxotroph, novel genetic code for the host genome, prevents functional transfer of DNA to other cells.
![Page 15: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/15.jpg)
Why sequence?
• Synthetic biology & laboratory selections• Pathogen "weather map", biowarfare sensors• Cancer: mutation sets for individual clones, loss-of-heterozygosity• RNA splicing & chromatin modification patterns.• Antibodies or "aptamers" for any protein• B & T-cell receptor diversity: Temporal profiling, clinical • Preventative medicine & genotype–phenotype associations • Cell-lineage during development• Phylogenetic footprinting, biodiversity
Shendure et al. 2004 Nature Rev Gen 5, 335.
![Page 16: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/16.jpg)
Personal genomics & cancer therapy
Mutations G719S, L858R, Del746ELREA in red.
EGFR Mutations in lung cancer: correlation with clinical response to gefitinib [Iressa] therapy. Paez, … Meyerson (Apr 2004) Science 304: 1497
Lynch … Haber, N Engl J Med. (Apr 2004) 350:2129.
Pao .. Mardis,Wilson,Varmus H, PNAS (Aug 2004) 101:13306-11.
Dulbecco R. (1986) A turning point in cancer research: sequencing the human genome. Science 231:1055-6.
![Page 17: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/17.jpg)
Why 'single molecule' sequencing?
(1) Single-cells: Preimplantation (PGD), uncultivatable
(2) Co-occurrence on a molecule, complex, cell RNA splice-forms & DNA haplotypes
(3) Cost: $1K-100K "personal genomes"http://grants.nih.gov/grants/guide/rfa-files/RFA-HG-04-003.html
(4) Precision: Counting 109 RNA tags (to reduce variance)
(~5e5 RNAs per human cell)Fixed 5e3 5e4 5e6 5e9 (goal) costs EST SAGE MPSS Polony-FISSeq (polymerase colony)
![Page 18: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/18.jpg)
CD44 Exon Combinatorics (Zhu & Shendure)
• Alternatively Spliced Cell Adhesion Molecule• Specific variable exons are up-or-down-regulated in
various cancers (>2000 papers)• v6 & v7 enable direct binding to chondroitin sulfate,
heparin…
Zhu,J, et al. Science. 301:836-8.
![Page 19: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/19.jpg)
Zhu J, Shendure J, Mitra RD, Church GM. Science 301:836-8. Single molecule profiling of alternative pre-mRNA splicing.
EXON PATTERN Eph4 Eph4bDD TOTALEph4 FRATIO LSTP-PV------------7-8-9-10 609 764 1373 1.17 1E-4--------------8-9-10 320 390 710 1.13 3E-2----------6-7-8-9-10 431 251 682 -1.85 4E-18------4-5-6-7-8-9-10 218 216 434 -1.08 2E-1----------------9-10 68 143 211 1.96 7E-7--------5-6-7-8-9-10 86 39 125 -2.37 2E-6----3-4-5-6-7-8-9-10 40 56 96 1.30 9E-2------4-5---7-8-9-10 16 74 90 4.30 2E-9--2-3-4-5-6-7-8-9-10 44 28 72 -1.69 1E-21-2-3-4-5-6-7-8-9-10 22 5 27 -4.73 3E-4--------5---7-8-9-10 5 19 24 3.53 3E-3----3-4-5---7-8-9-10 1 15 16 13.95 4E-4--2-3-4-5---7-8-9-10 1 10 11 9.30 5E-3
Eph4 = murine mammary epithelial cell line
Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)
CD44 RNA isoforms
![Page 20: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/20.jpg)
Chromosome-wide haplotyping
IL6-3572 : A
60-Mb
CD36-4366 : A/T
Human Chr. 7
A..A
A..T
73
3
1
150 Mb
![Page 21: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/21.jpg)
Convergence on non-electrophorectic tag-sequencing methods?
Tag >400 14-26 20 100 26 bp (2-ends) EST SAGE MPSS 454 Polony-Seq Ronaghi• Single-molecule vs. amplified single molecule. • Array vs. bead packing vs. random• Rapid scans vs. long scans (chemically limited, 454)• Number of immobilized primers: 0: Chetverin'97 "Molecular Colonies" 1: Mitra'99 > Agencourt "Bead Polonies" 2: Kawashima'88, Adams'97 > Lynx/Solexa: "Clusters"
http://arep.med.harvard.edu/Polonator/Plone.htm
![Page 22: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/22.jpg)
Bead Polony Sequencing Pipeline
In vitro libraries via paired tag
manipulation
Bead polonies via emulsion PCR
[Dre03]
Monolayered immobilization in acrylamide
Enrichment of amplified beads
SOFTWARE
Images → Tag Sequences
Tag Sequences → Genome
FISSEQ or “wobble”sequencing
Epifluorescence Scope with Integrated Flow
Cell
![Page 23: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/23.jpg)
Polony Fluorescent In Situ Sequencing Libraries
Greg PorrecaAbraham Rosenbaum
1 to 100kb Genomic1 to 100kb Genomic
M
L R
M
PCRbead
Sequencingprimers
Selectorbead
2x20bp after MmeI (BceAI, AcuI)
Dressman et al PNAS 2003 emulsion
![Page 24: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/24.jpg)
Cleavable dNTP-Fluorophore (& terminators)
Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and Church,GM (2003) Fluorescent in situ Sequencing on Polymerase Colonies. Analyt. Biochem. 320:55-65
Reduce
or
photo-cleave
![Page 25: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/25.jpg)
Polony-FISSeq: up to 2 billion beads/slideCy5 primer (570nm) ; Cy3 dNTP (666nm)
Jay ShendureSelf Organizing Monolayer
![Page 26: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/26.jpg)
• # of bases sequenced (total) 23,703,953
• # bases sequenced (unique) 73
• Avg fold coverage 324,711 X
• Pixels used per bead (analysis) ~3.6
• Read Length per primer 14-15 bp
• Insertions 0.5%
• Deletions 0.7%
• Substitutions (raw) 4e-5 • Throughput: 360,000 bp/min
Polony FISSeq Stats
Current capillary sequencing 1400 bp/min (600X speed/cost ratio, ~$5K/1X)
(This may omit: PCR , homopolymer, context errors)Shendure
![Page 27: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/27.jpg)
High accuracy special case: homopolymers (e.g. AAA, CC, etc.)
• Use "compressed" tags , ACG = ACCG=ACCCG• Quantitate incorporation • Reversible terminators• FRET between adjacent 3' bases • Wobble sequencing
All five of these work.
• Maintenance of amplification fidelity using linear amplification from initial genomic fragment
![Page 28: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/28.jpg)
Degenerate (aka “wobble) sequencing
“single tipped” vs “double tipped”
length of anchoring sequence
natural vs. universal nucleotides (i.e. deoxyinosine)
single fluor vs. four-color fluor mixtures of dNTPs for extensions
Sequenase vs Klenow vs BST
Exonuclease stripping vs heat stripping
CTAGCGAGCTAGNNNNNNNNACTAGCGAGCTAGNNNNNNNNGCTAGCGAGCTAGNNNNNNNNCCTAGCGAGCTAGNNNNNNNNT
anchor degenerate
“tip”
![Page 29: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/29.jpg)
Wobble vs Simple base-extension
1/4 vs 2.5/4 base/cycle
>8 vs 14-200 base reads
3e-3 vs 4e-5 non-homopolymer errors
3e-3 vs 1e-1 homopolymer errors
40' per cycle, 60 hr per 20 cycles
![Page 30: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/30.jpg)
Sequencing single molecules
Ecosystem studies need single-cell amplification because of multiple chromosomes (& RNAs) per cell. Many cells are hard to grow. Microbes exchange genome subsets.
(Even an 80% genome coverage is better than 100 kb BACs)
Many input molecules required to sequence one molecule. vs. one molecule sufficient to sequence via many copies of it.
![Page 31: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/31.jpg)
Single cell sequencing
29 real-time amplification
No template control
Affymetrix quantitation of independent amplifications
![Page 32: Low-Cost/High-Accuracy Microbial Genome Synthesis and Monitoring](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815012550346895dbdf7a9/html5/thumbnails/32.jpg)
.