sets and numbers - kaims.eti.pg.gda.plkaims.eti.pg.gda.pl/~giaro/biocomp/l2.pdfsets and numbers for...

7

Upload: vuphuc

Post on 28-Feb-2019

220 views

Category:

Documents


0 download

TRANSCRIPT

Sets and numbersSets and numbersFor any sets For any sets AA and and BB::||AA|| – – cardinality, sizecardinality, size of of AA, the number of its elements,, the number of its elements,AA∪∪BB – – unionunion of of AA and and BB i.e. i.e. AA∪∪BB== x: xx: x∈∈AA ∨ ∨ xx∈∈BB AA∩∩BB – – intersectionintersection of of AA and and BB i.e. i.e. AA∩∩BB== x: xx: x∈∈AA ∧ ∧ xx∈∈BB AA\\B, AB, A––BB – – relative complementrelative complement i.e. i.e. AA\\BB== x: xx: x∈∈AA ∧ ∧ xx∉∉BB AA××BB – Cartesian product – Cartesian product i.e. i.e. AA××BB=(=( x,yx,y)): x: x∈∈AA ∧ ∧ yy∈∈BBAAkk = = AA××AA××…A…A ( (kk times) times)f:f:AA→→BB – function – function f f from its from its domain A domain A to to codomain Bcodomain B. . ff∈∈BBAA

AA⊆⊆BB – true if – true if AA is a is a subsetsubset of of BB∅∅ – – empty set, empty set, of size 0of size 0..

Sets of integer numbers:Sets of integer numbers:N=ZN=Z++, Z, Z, Z, Z≥≥00 – natural numbers (positive integers), all integers, non- – natural numbers (positive integers), all integers, non-negative integers.negative integers.

Real numbersReal numbers and subsets: and subsets: R, RR, R++, R, R≥≥00

Formal languagesFormal languagesΣΣ – – alphabet, a alphabet, a finite set of symbols (finite set of symbols (lettersletters). ). word word over alphabet – finite sequence of letters. over alphabet – finite sequence of letters. εε – – empty wordempty word (of length 0) (of length 0)

Biological sequences may be treated as words of alphabets:Biological sequences may be treated as words of alphabets:4–letter G,C,A,T for DNA and G,C,A,U for RNA4–letter G,C,A,T for DNA and G,C,A,U for RNA20–letters (amino acids) for proteins20–letters (amino acids) for proteins64–letters for gene treated as a sequence of codons.64–letters for gene treated as a sequence of codons.

ΣΣ ** – set of all words over – set of all words over ΣΣ; ; ΣΣ ++==ΣΣ** \\ εε.. εε, A, C, G, T, AA, AC, …, TT, … ACT, … TCA, … CCAA, … , A, C, G, T, AA, AC, …, TT, … ACT, … TCA, … CCAA, … ||ww|| – – lengthlength of a word of a word ww∈Σ∈Σ** ww[[ ii]] – – ii–th symbol of a word –th symbol of a word ww[[ i..ji..j]] – – subword,subword, the block of its consecutive letters from the block of its consecutive letters from ii–th to –th to jj–th–th((ww[[ i..ji..j ]=]=εε for for i>ji>j ))wvwv – concatenation – concatenation of words of words

Metric spaceMetric spaceFor For a set a set X X the the metricmetric over over XX is a non-negative function is a non-negative function d:Xd:X××XX→→RR≥≥00 showing how far (for example „unsimilar”) any two elements of showing how far (for example „unsimilar”) any two elements of X X are. Axioms: are. Axioms:

∀∀x,yx,y∈∈XX dd((x,yx,y)=)=00 ⇔⇔ x=yx=y

∀∀x,yx,y∈∈XX dd((x,yx,y)=)=dd((y,xy,x)) ((symmetrysymmetry))

∀∀x,y,zx,y,z∈∈XX dd((x,yx,y) ) ≤≤ dd((x,zx,z)+)+dd((z,yz,y)) ( (triangle inequalitytriangle inequality))

Example.Example. Pythagorean metric on a plane. Pythagorean metric on a plane.dd((PP11, , PP22)=[()=[(xx11––xx22))22+(+(yy11––yy22))22]]1/21/2

P1=(x1,y1)

P2=(x2,y2)d

Example.Example. Taxi metric on a plane. Taxi metric on a plane.dd((PP11, , PP22)=|)=|xx11––xx22|+||+|yy11––yy22||

Example.Example. Discrete metric on any set Discrete metric on any setdd((PP11, , PP22)=1 )=1 ⇔⇔ PP11≠≠ PP22

Landau notationLandau notationFor comparisonFor comparison of of growthgrowth rates of positive functions rates of positive functions NN→→ R R++ or or NN→→NN::f=Of=O((gg)) – – ff grows not faster than grows not faster than gg i.e. i.e. ∃∃cc>0>0∀∀xx∈∈NN ff((xx)<)<cgcg((xx))

xx+1=+1=OO((xx22), 1000), 1000xx10001000==OO(2(2xx), ), 55xx22 = =OO((xx22)) f=of=o((gg)) – – ff grows slower than grows slower than gg i.e. lim i.e. limxx ff((xx)/)/gg((xx)=0)=0

f= f= ΩΩ((gg)) – – ff grows not slower than grows not slower than gg i.e. i.e. ∃∃cc>0>0∀∀xx∈∈NN gg((xx)<)<cfcf((xx) )

⇔⇔ g=Og=O((ff) ) f=f=ϖϖ((gg)) – – ff grows faster than grows faster than gg i.e. lim i.e. limxx gg((xx)/)/ff((xx)=0)=0

⇔⇔ g=og=o((ff)) f=f=ΘΘ((gg)) ⇔⇔ f=Of=O((gg) ) ∧∧ g=Og=O((ff) ) –– the same growth speeds the same growth speeds

10001000xx1000 1000 + 10+ 10xx==OO((xx10001000))

Definitions can be easily generalized for functions of more Definitions can be easily generalized for functions of more arguments. arguments.

Computational complexityComputational complexityProperties of computer algorithm evaluating its qualityProperties of computer algorithm evaluating its quality ..

Computational Computational ((timetime)) complexity complexity – function that estimates – function that estimates (upper (upper bound) the bound) the worst–case operation number performed during execution worst–case operation number performed during execution in terms of input data size.in terms of input data size.

Space complexity Space complexity –– estimates estimates (upper bound) the (upper bound) the worst–case worst–case memory memory usage usage during execution in terms of input data size.during execution in terms of input data size.

Polynomial Polynomial ((timetime)) algorithm algorithm –– when time complexity may be when time complexity may be bounded by some polynomial of data size. In computing theory bounded by some polynomial of data size. In computing theory polynomial algorithms are considered as efficient. polynomial algorithms are considered as efficient.

Computational complexityComputational complexity

NP-hard problemsNP-hard problems are commonly believed not to have polynomial are commonly believed not to have polynomial time algorithms solving them. For these problems time algorithms solving them. For these problems we we can only use can only use fast but not accurate procedures or (for small instances) long time fast but not accurate procedures or (for small instances) long time heuristics. NP-hardness is treated as computational intractability.heuristics. NP-hardness is treated as computational intractability.

Problem.Problem. How How to prove that problem to prove that problem AA is NP-hard? is NP-hard? Sketch: Sketch: Find any NP-hard problem Find any NP-hard problem BB and show the efficient and show the efficient (polynomial) procedure that reduces (translates) (polynomial) procedure that reduces (translates) BB into into AA.. Then Then AA is is not a less general problem than not a less general problem than BB, therefore if , therefore if BB was hard, so is was hard, so is AA..

Graph theoryGraph theoryGraphGraph GG((V,EV,E) consists of a set of ) consists of a set of verticesvertices VV and a set of and a set of edgesedges (connecting vertices) (connecting vertices) EE. .

DegreeDegree of a vertex is the number of of a vertex is the number of edges incident (connected) with them.edges incident (connected) with them.

vertices

edges

A graph is A graph is connectedconnected if we can reach any vertex if we can reach any vertex from any other passing some edges. Otherwise a from any other passing some edges. Otherwise a graph hagraph hass more more connected componentsconnected components..

deg(v)=4

v

MutigraphMutigraph is more general than a graph – may is more general than a graph – may have parallel edges (connecting the same two have parallel edges (connecting the same two vertices) and loops (connecting a vertex with vertices) and loops (connecting a vertex with itself).itself).

Graph theoryGraph theoryGraph is a Graph is a pathpath if it is connected, two (endpoint) vertices have if it is connected, two (endpoint) vertices have degree 1 and the rest is of degree 2.degree 1 and the rest is of degree 2.

CycleCycle is connected and has all vertices is connected and has all vertices of degree 2.of degree 2.

A graph isA graph is a a treetree if if it is connected, and removing it is connected, and removing any edge disconnects themany edge disconnects them..

BipartiteBipartite graphgraph is more general than a tree. is more general than a tree. Vertices may be partitioned into two disjoint sets Vertices may be partitioned into two disjoint sets in such a way that all edges connect vertices in such a way that all edges connect vertices from different partitions only.from different partitions only.

Graph theoryGraph theoryDigraphDigraph GG((V,EV,E) is a similar concept to graph, but its edges (called ) is a similar concept to graph, but its edges (called arches) are directed. arches) are directed.

Example.Example. Computational problems on graphs. Computational problems on graphs. Edges are equipped with non-negative positive Edges are equipped with non-negative positive weights (lengths).weights (lengths).

indeg(v)=3outdeg(v)=1

v

LongestLongest path path (between two given vertices) is (between two given vertices) is NP-hard in graphs and digraphs.NP-hard in graphs and digraphs.

Shortest pathShortest path (between two given vertices) is (between two given vertices) is polynomial time solvable polynomial time solvable in graphs and digraphs in graphs and digraphs (Dijkstra algorithm, (Dijkstra algorithm, OO(|(|VV||22) time).) time). 1

2

35 2

3

1

4

6

Graph theoryGraph theoryDigraph is Digraph is acyclicacyclic if it is impossible to turn back to any vertex passing if it is impossible to turn back to any vertex passing arches (taking into account their orientation). arches (taking into account their orientation).

⇔⇔ there is a there is a topological orderingtopological ordering of vertices: in which of vertices: in which the start of the start of any arcany archh precedes its end precedes its end..

Longest path problem (from a Longest path problem (from a given vertex given vertex SS to all the others) to all the others) is polynomial time problem for is polynomial time problem for acyclic digraphs.acyclic digraphs.

A

SC G

T

B E H J

D F I

e19

e13e8e4

e2 e10 e15

e17e12e7

e18e14e9e5

e1

e3 e6 e11 e16

Graph theoryGraph theoryCritical path algorithm.Critical path algorithm. InputInput : acyclic digraph with weights on arches, : acyclic digraph with weights on arches, SS∈∈V V 1. 1. FFind a topological ind a topological verticesvertices ordering, ordering, // sequence of their proceeding // sequence of their proceeding2. 2. Set initial vertices labels Set initial vertices labels ll ((SS)=0, )=0, ll((vv)=)=––∞∞ for for vv≠≠SS,,3. 3. AAssign ssign ll((vv)=max)=maxll((uu)+)+ww((ee)): arc : arc ee lead fromlead from uu toto vv to each node to each node vv,,OutputOutput : : ll((vv) is equal to) is equal to the length of the longest path from the length of the longest path from SS to to vv..

S:A:B:C:D:E:F:G:H:I:J:T:

0 S:0+3 3 S:0+2 2 S:0+8, A:3+4, B:2+6 8 A:3+2 5 B:2+911 C:8+1 ,D:5+29 C:8+2, E:11+112 E:11+213 F:9+6, G:12+517 G:12+6, H:13+218 I:17+5, J:18+3, G:12+922

A

SC G

T

B E H J

D F I

e19,3

e13,6e8,2e4,2

e2,8 e10,2 e15,9

e17,2e12,2e7,9

e18,5e14,5e9,1e5,4e1,3

e3,2 e6,6 e11,1 e16,6

Topological

ordering