dna structure notation operationsprofs.sci.univr.it/~manca/mnc/dna-one.pdf · 2014. 11. 4. · 2 13...
TRANSCRIPT
1
DNADNAStructureStructureNotationNotation
OperationsOperations
Vincenzo MancaVincenzo Manca
Dipartimento di InformaticaDipartimento di Informatica
UniversitaUniversita’’ di di VeronaVerona
2
13 Years of Molecular Computing13 Years of Molecular Computing 1994 1994 AdlemanAdleman’’s s Experiment *Experiment * 1995 Lipton1995 Lipton’’s Model *s Model * 1996 1996 IntInt. Conf. on Math. Linguistics (Marcus). Conf. on Math. Linguistics (Marcus) 1997 1997 Mangalia Mangalia ((PaunPaun, Head), Head) 1998 MFCS Brno (Molecular Computing) 1998 MFCS Brno (Molecular Computing) 1999 ( 1999 (PaunPaun’’s s WMC)WMC) 2000 DNA6 2000 DNA6 Leiden Leiden ** 2001 DNA7 Tampa (FL) : 3-SAT2001 DNA7 Tampa (FL) : 3-SAT 2002 DNA8 Sapporo : DNA Duplication 2002 DNA8 Sapporo : DNA Duplication 2004 DNA10 2004 DNA10 Milano Milano : XPCR Extraction: XPCR Extraction 2005 DNA11 Ontario : XPCR Recombination 2005 DNA11 Ontario : XPCR Recombination 2007 DNA13 Memphis : MP Systems2007 DNA13 Memphis : MP Systems 2005 DNA14 Prague : Genetic Drift2005 DNA14 Prague : Genetic Drift
3
DNA Computing MottoDNA Computing Motto
Problem: Data and RequirementsProblem: Data and Requirements Algorithm: SolutionsAlgorithm: Solutions
Encode data by DNA strandsEncode data by DNA strands Encode algorithms by biotech proceduresEncode algorithms by biotech procedures Decode final strands as solutions Decode final strands as solutions
http://profs.scienze.univr.it/~manca/
Faculty Page
Papers and Tutorials
4
A General schema of combinatorial problemA General schema of combinatorial problem
A set of Requirements for A set of Requirements for ““assignmentsassignments””, that is,, that is,sequences 0/1 of some length nsequences 0/1 of some length n
The Space of possible solutions has E(2,n) elements,The Space of possible solutions has E(2,n) elements,but only some of them satisfy the requirementsbut only some of them satisfy the requirements
Encode assignments by DNA strandsEncode assignments by DNA strands
Encode requirements as biotech protocols that filterEncode requirements as biotech protocols that filterthe strands encoding the true solutionsthe strands encoding the true solutions
5
Space GenerationIn linear time
Solution ExtractionIn linear time
!!!
6
New Trends in DNACNew Trends in DNAC
oo DNA Self Assembly (DNA Self Assembly (SeemanSeeman, , WinfreeWinfree, , ……))
oo DNA Automata (Shapiro)DNA Automata (Shapiro)
oo DNA Algorithms ==> new biotech protocolsDNA Algorithms ==> new biotech protocols
7
8
Biotech ProtocolsBiotech Protocols
AlgorithmsAlgorithms
DNA ComputingComputing DNA
A change of perspective
9
In the search for implementing algorithmsIn the search for implementing algorithmson DNA, general algorithmic principles areon DNA, general algorithmic principles arediscovered in fundamentaldiscovered in fundamental biomolecular biomolecularprocesses.processes.
10
1’
2’3’
4’
5’ O
P
A
CH2
1’
2’ 3’
4’
5’O CH2OH
H
1’
2’3’
4’
5’ O
C
CH2OH
T
NucleotidesNucleotides
~330 Dalton1 Dalton = 1.64 10-24
1 g. H = 6.2 1023
1’--- 1’ = ~ 1nm
A few grams of DNA = the amount of all electronic information stored in all the world
--------
O- P O
O-
O-
PO43-
P3O105-
G
{A, T, C, G}
5’
3’
Nucleoside
PhosphodiestericGlycosidic
Hydrogen
BONDS
11
12
BilinearityBilinearityComplementarityComplementarityAntiparallelismAntiparallelism
The marvelous formThe marvelous form
5’
3’
13
V. MancaOn the logic of bilinear forms,
Fundamenta Informaticae, 2006
P
14
SSTRANDTRAND H HYBRIDIZATIONYBRIDIZATION
15
αα ββ
γγ
16
17
18
DNA DNA LigaseLigase
α δ
α’ δ’
α’ δ’
Ligase Joins 5' phosphateto 3' hydroxyl
α’ δ’α
δ
19
StringsStrings Strings over an alphabet are Strings over an alphabet are sequencessequences of of
symbols of the alphabet :symbols of the alphabet :
abbabbbaabbabbba
On strings a On strings a concatenationconcatenation associative associativeoperation - - is definedoperation - - is defined
((αβαβ))γγ = = αα((βγβγ))αα = = αλαλ = = λαλα
A language L is a set of strings A language L is a set of strings
20
DNA Sequences are DNA Sequences are Mobile Double StringsMobile Double Strings
B B = {A, T, C, G}= {A, T, C, G}
B* = B* = strings over strings over BB
αα[i,j][i,j]
||αα||
s is a s is a αα-strand -strand oror s : s : αα or or type(type(ss )= )=αα
αα :n :n or or multmult((αα)=n)=n
21
Complementation Complementation - - c c ((involutiveinvolutive))
Reverse Reverse rev rev ((involutiveinvolutive))MirrorMirror mir mir ((involutiveinvolutive))
mirmir((αα)= )= revrev((ααcc) )
Reverse and ComplementationReverse and Complementation commutecommute
22
DNA Sequences are DNA Sequences are Floating Double StringsFloating Double Strings
B B = {A, T, C, G}= {A, T, C, G}B/BB/B * = double and single * = double and single strings over strings over BB
HybridizationHybridization ||||] [] [] ] γγ [ [
PairingPairing αα ββ
23
Hybridization :Hybridization :αα || || mirmir((αα))
αα] ] γγ [ [ββ <==> <==> αα ⊃⊃ γγ , , ββ ⊃⊃ mirmir((γγ))
αα] [] [ββ <==> <==> αα] ] γγ [ [ββ for some for some γγ
Pairing :Pairing : αα] [] [ββ ==> ==> αα / / revrev((ββ) )
24
ATTGGCGCCAAT
ATTGGC
GCCAAT
AxiomAxiom
αα = = rev(rev(αα) ) rev( rev(ββ)) ββ
< <αα> = <> = <mirmir((αα)>)>
25
Fraction NotationFraction Notation αα / / λλ = = αα = = αα->->
λλ / / αα = = revrev((αα) = <- ) = <- αα
αα / / mirmir((αα) = <) = <αα>>
< <αα> = <> = <mirmir((αα)>)>
26
B B = {A, T, C, G}= {A, T, C, G}
BBBB* = (double) * = (double) strings over strings over BB
extext
overlapoverlap
overlapping concatenationoverlapping concatenation
paired concatenationpaired concatenation
27
Polymerase ExtensionPolymerase Extension
extext
28
Overlap Relation
Overlapping Concatenation
overlapoverlap
overlapping concatenationoverlapping concatenation
29
Ligase Ligase CatenationCatenation
paired concatenationpaired concatenation
A pool P of DNA molecules is aA pool P of DNA molecules is amultiset multiset of strandsof strands
i) Set of strands typed by stringsi) Set of strands typed by strings
ii) Set of strings with multiplicitiesii) Set of strings with multiplicities
P = {s1:P = {s1:αα1 , s2:1 , s2:αα2, 2, …….}.}
P = {P = {αα1: n1 , 1: n1 , αα2: n2, 2: n2, …….}.}
multmultPP((αα1) = n1 , 1) = n1 , multmultPP ((αα2) = n22) = n2
s s ∈∈ P P
αα ∈∈ P P
31
Types of DNA Pools areTypes of DNA Pools areLanguages of BB*Languages of BB*
Type(T) = {Type(T) = {ηη ∈∈ BB*BB* | s : | s : ηη , s , s ∈∈ T } T }
32
Test Tube Operations in DNACTest Tube Operations in DNAC Denature (Melting)Denature (Melting)
Renature Renature (Hybridization, Annealing)(Hybridization, Annealing)
MixMix
SplitSplit
fish (by Affinity)fish (by Affinity)
RemoveRemove
lengthlength
Separate (Gel Electrophoresis)Separate (Gel Electrophoresis)
Ligate Ligate ((LigaseLigase))
Extend (Polymerase)Extend (Polymerase)
Synthetize Synthetize ((OligosOligos))
InfixInfix
pTpApCpGOH
pGOH
COH
33
BufferGel
Electrode
Electrode
Samples
Faster
Slower
GEL ELECTROPHORESIS – Separation of DNAfragments
34
More Complex OperationsMore Complex Operations
Amplification (PCR)Amplification (PCR)
Sequencing (Sanger)Sequencing (Sanger)
Restriction (R. Enzymes)Restriction (R. Enzymes)
Clonation Clonation ((Plasmide TransinfectionPlasmide Transinfection))
ddA, ddT, ddC, ddG
35
PCR: Polymerase ChainPCR: Polymerase ChainReactionReaction
36ExponentialLinear
h(α)h(β)
α
β
long short
PCR with 3PCR with 3’’ sticky end sticky end
37
PCR LemmaPCR Lemma
Let P be a pool of type {Let P be a pool of type {α⁄βα⁄β} including primers } including primers γγ, , δδ,,then PCR(P, then PCR(P, γγ, , δδ) provides an exponential) provides an exponentialamplification amplification iff iff one of 4 cases holds (defined byone of 4 cases holds (defined bymeans of overlapping concatenation), and, at mostmeans of overlapping concatenation), and, at mostat the third step, the (blunt) seed of an exponentialat the third step, the (blunt) seed of an exponentialamplification is generated (its form depends on theamplification is generated (its form depends on thespecific case which holds).specific case which holds).
V.V. Manca Manca, G. Franco, G. Franco““Computing by polymerase chain reactionComputing by polymerase chain reaction”” Mathematical Biosciences, N. Mathematical Biosciences, N. 211, 282211, 282––298, 2008.298, 2008.
38
T of
type L
Operation
T’ oftype L’
39
MathematicallyMathematicallyTest Tube OperationsTest Tube Operations
Type (T) = LType (T) = L means thatmeans that
Types of strands of T constitute the language LTypes of strands of T constitute the language L
Given some test tubes as arguments with some typesGiven some test tubes as arguments with some types
provide as resultsprovide as results
Test tubes with other typesTest tubes with other types
40
41
DNA Test Tube MachineDNA Test Tube Machine
Register Machines where:Register Machines where:
- Registers are Test Tubes- Registers are Test Tubes((multisets multisets of strands instead of numbers)of strands instead of numbers)
- DNA Test Tubes operations - DNA Test Tubes operations(instead of arithmetic operations)(instead of arithmetic operations)
42
AdlemanAdleman’’s s ProblemProblem
Given a Graph (of seven nodes)
Find (if there are)The paths from two given nodes (0,6)
Passing once for every node(Hamiltonian paths)
43
Adleman Adleman - Lipton- Lipton’’s Extract Models Extract ModelIn Combinatorial ProblemsIn Combinatorial Problems
The Generation of all possible solutionsin linear time
The Extraction of true solutionsin linear time
Extraction is performed in a number of sub-steps andeach of them selects all the strands that include a sub-strand of a given type
44
Adleman’s Graph
45
αic βjc
Node i = αi βi
Arc ij = mir(βi αj)
Ai BiBj
Bj’ Ai’
αi βi
AdlemanAdleman’’s s EncodingEncoding
|αi| = |βi| = 10 i , j = 0, …, 6
46
AdlemanAdleman’’s s AlgorithmAlgorithm
Generation of Generation of hamiltonian hamiltonian paths from v1 to v7paths from v1 to v7
Generate paths of G (hybridization/Generate paths of G (hybridization/ligationligation))Perform PCR of primers Perform PCR of primers α0, mir(β6)
Separate paths of length 140 (7 x 20)Separate paths of length 140 (7 x 20)forfor J := 0 J := 0 toto 6 6 dodo Select strands where Select strands where αjβj occurs occurs
outputoutput remaining strands remaining strands
47
MIX and Split MethodMIX and Split Method
Generation of space solutions of N variablesGeneration of space solutions of N variables
Merge X1 and Merge X1 and ¬¬X1 in a tube TX1 in a tube T
Split T into A and BSplit T into A and BFor J := 2 To NFor J := 2 To N
Extend strands of A with XJExtend strands of A with XJExtend strands of B with Extend strands of B with ¬¬XJXJ
Merge A and B into TMerge A and B into TSplit T into A and BSplit T into A and B
Merge A and BMerge A and B
48
LiptonLipton’’s Algorithm 3-Sat(N, M)s Algorithm 3-Sat(N, M)
oo Generate N-space solutions in TGenerate N-space solutions in Too For J = 1 To MFor J = 1 To M
T1 := Extract [T, L(1,J)]T1 := Extract [T, L(1,J)] T := T - T1T := T - T1 T2 := Extrtact[T , L(2,J)]T2 := Extrtact[T , L(2,J)] T := T - T2T := T - T2 T3 := Extract[T , L(3,J)]T3 := Extract[T , L(3,J)] T := Merge(T1, T2)T := Merge(T1, T2) T := Merge(T, T3)T := Merge(T, T3)
oo Detect TDetect Too ifif T T≠≠ ∅∅, , thenthen take a clone and sequence it (Solution)take a clone and sequence it (Solution)
oo elseelse ““Unsolvable ProblemUnsolvable Problem””
DNA ExtractionDNA ExtractionStrands of type Strands of type γγ are called are called γγ-strands-strands
(or instances of (or instances of γγ))
A A ββ-strand with -strand with ββ including including γγ as substring as substring is iscalled a called a γγ--superstrand superstrand ((ββ is a is a γγ--superstring)superstring)
Problem:Problem:
Extract all the Extract all the γγ--superstrands superstrands of a pool Pof a pool P
50
X1
Y1
X2
Y2
X3
Y3
Xn
Yn
From 2n strands to 2n strandsStarting from 4 strands (n-multiples of X, Y)
in linear time.