a c g t a a t g g t t a ac t a g t t a g g a a t c g c g c a t t a t g t c c a c g t t a g g t t g a...

39

Upload: lucrezia-tortora

Post on 01-May-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G
Page 2: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

A C G G T A

A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C

A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C

A C G T T A T G A A A T T G G G G C A G G T T T A A C G C G C C C

CA G A T

Page 3: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

A U G G UU A A C U A G UU A G G A A U C G C G C A U U A U G U C C

A C G U U A G G U U G A A C G G C A G G U U U A A A U C G A U U C C

A CG G UA

CA G A U

A C G U U A UG A A A U U G G G G C A G G U U U A A C G C G C C C

Metionina

Valina

Asparagina

STOP

Serina

Treonina

Prolina Lisina

Leucina

Glicina

Glutamina

M V N M S

T

V

P

M LK G Q V

M

V

N S

T

P

L

K G

Q

Page 4: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

ATTACGGCCATGCGGAGCCGGAAG

CCATG

presente in ?

algoritmo che richiede un numero di confrontipari alla lunghezza di

Page 5: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

confronto approssimato di stringhe

ALLINEAMENTO

T G - T A - C G G A - - A T C G G AT - C T - C C G - A C C A T C G G A

T G T A C G G A A T C G G A

T C T C C G A C C A T C G G A

4

3

+

=

7T G C TAC C G G A C C A T C G G A

Page 6: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

T G T A C G G A A T C G G AT C TC C GA C CA T CG G A

T C TC C GA C CA T CG G A

T G T A C G G A A T C G G A

Page 7: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

T C - T - C C - G A C C A T C G G AT - G T A C - G G A - - A T C G G A

4

3

+

=

7

T G - T A - C G G A - - A T C G G AT - C T - C C G - A C C A T C G G A

Page 8: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

cammino minimo

quante operazioni ?

N.B. : il numero di cammini è molto elevato

µn+mn∂n=m=10=)184:756n=m=20=)137:846:528:820impossibile la valutazione esplicita !

Page 9: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

RICORSIONE !

V(n;m)= min V(n°1;m°1)(n;m°1) (n;m)

(n°1;m)(n°1;m°1)

V(0;0)=0V(n;m°1)+1

V(n°1;m)+1

Page 10: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

ogni arco viene considerato esattamente una volta

numero operazioni = numero archi = mndue sequenze di 1000 basi richiedonoun milione di operazioni

Page 11: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

Diverso modello: sostituzioni ammesse

T G T A C G G A A T C G G A

T C T C C G A C C A T C G G A

T G T A C G G A - - A T C G G A

T C T C C G - A C C A T C G G A

4

2

2

8

Page 12: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

T G T A C G G A A T C G G AT C TC C GA C CA T CG G A

14

6

Page 13: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

T G T A C G G A - A T C G G A

T C T C C G A C C A T C G G A

T G T A C G G A - - A T C G G A

T C T C C G - A C C A T C G G A

Page 14: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G
Page 15: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

T G T A C G G A A T C G G A

T C T C C G A C C A T C G G A

A C T C A G A C A A T G A

T G T A C G - G A A T C G G A

T C T C C G A C C A T C G G A

A C T C A G A C A A T - - G A

ALLINEAMENTO MULTIPLO

Page 16: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G
Page 17: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

Numero confronti = prodotto lunghezze stringhe

3 stringhe lunghe 1000

un miliardo di operazioni !

Page 18: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

TAGA CTGA

CTAGA ATGA

ATAGA

TACA TAGA

AGGA ATGA

?

?

?

?

TAGA

ATGA

CTGA

CTAGA

A G - G A

- T A C A- T A G AC T - G A

A T - G AA G - G A

- T A C A- T A G AC T - G A

A T - G A

Page 19: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

AUGCCGAUUCAACGGUCCUACUCGGACUUUACC

M P I Q R S Y S D F T

M R I S R S D S D Y T

punteggio (M<->M, P<-> R ...) basatosulle probabilità di mutazione

Page 20: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

RICOSTRUZIONE DEI FRAMMENTI

Page 21: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

ACGTTACGTTACGGATCGGATTCACGGCGATT

AACAAGCTTCGGAATCGTTACCGGATCGGTTAGG

CGAATTAGTGGCGAA

GGCCTTAAACGACGATGCATTCGAATATCGATCGCGCGAATGTGCATA

ACCGGACTGTCGCGACGCGCGATGTGTAGAGCTTGATCTCGGATATACGCGATATTGTGAATA

ACGTTACGTTACGAATCGGATTCACGGCGATT

AACCAGCTTCGGAATCG

TTACCGGATCGGTTAGG

CGAATTAGTGGCGAA

AGCCTTAAACGACGATGCATTCGAATATCGATCGCGCGAATGTGCATA

ACCGGACTGTCGCGACGCGCGATGTGCAGAGCTTGATCTCGGATATACGCGATATTGTGAATA

ACGTTACGTTACGGATCGGATTTACGGCGATT

AACAAGCTTCGGAATCGTTACCGGATCGGTTAGG

AGAATTAGTGGCGAA

GGCCTTAAACGACGATGCATTCGAATATCGATCGCGCGAATGTGCATA

ACCGGACTCTCGCGACGCGCGATGTGTAGAGCTTGATCTCGGATATACGCGCTATTGTGAATA

ACATTACGTTACGGATCGGATTCACGGCGACT

AACAAGCTTCGGAATCGTTACCGGATCGGTTAAG

CGAATTAGTGGCGAA

GGCCTTAAACGACGTTGCATTCGAATATCGATCGCGCGAATGTGCATA

ACCGGACTGTCGCGACGCGCGATTTGTAGAGCTTGATCTCGGATATACGCAATATTGTGAATA

ACGTTACGTTACTGATCGGATTCACGGCGATT

AACAAGCGTCGGAATCGTTACCGGATCGGTTAGG

AGAATTAGTGGCGAA

GGCCTTAAACGACGATGCATTGGAATATCGATCGCGCGAATGTGCATA

AACGGACTGTCGCGACGCGCGATGTGTAGAGCTTGTTCTCGGATATACGCGATATTGTGAATA

ACGTTACGTTACGGATCGGATTCACGGCAATT

AACAAGCTTCGGAATAGTTACCGGATCGGTTAGG

CGAATTAGTGGCGAA

GGCCTTAAACGACGATGTATTCGAATATCGATCGCGCGAATGTGCATA

ACCGGACTGTCGCGACGCTCGATGTGTAGAGCTTGATCTAGGATATACGCGATATTGTGAATA

Page 22: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

ACGTTACGTTACGGATCGGATTCACGGCGATT

AACAAGCTTCGGAATCGTTACCGGATCGGTTAGG

CGAATTAGTGGCGAA

GGCCTTAAACGACGATGCATTCGAATATCGATCGCGCGAATGTGCATA

ACCGGACTGTCGCGACGCGCGATGTGTAGAGCTTGATCTCGGATATACGCGATATTGTGAATA

ACGTTACGTTACGAATCGGATTCACGGCGATT

AACCAGCTTCGGAATCG

TTACCGGATCGGTTAGG

CGAATTAGTGGCGAA

AGCCTTAAACGACGATGCATTCGAATATCGATCGCGCGAATGTGCATA

ACCGGACTGTCGCGACGCGCGATGTGCAGAGCTTGATCTCGGATATACGCGATATTGTGAATA

ACGTTACGTTACGGATCGGATTTACGGCGATT

AACAAGCTTCGGAATCGTTACCGGATCGGTTAGG

AGAATTAGTGGCGAA

GGCCTTAAACGACGATGCATTCGAATATCGATCGCGCGAATGTGCATA

ACCGGACTCTCGCGACGCGCGATGTGTAGAGCTTGATCTCGGATATACGCGCTATTGTGAATA

ACATTACGTTACGGATCGGATTCACGGCGACT

AACAAGCTTCGGAATCGTTACCGGATCGGTTAAG

CGAATTAGTGGCGAA

GGCCTTAAACGACGTTGCATTCGAATATCGATCGCGCGAATGTGCATA

ACCGGACTGTCGCGACGCGCGATTTGTAGAGCTTGATCTCGGATATACGCAATATTGTGAATA

ACGTTACGTTACTGATCGGATTCACGGCGATT

AACAAGCGTCGGAATCGTTACCGGATCGGTTAGG

AGAATTAGTGGCGAA

GGCCTTAAACGACGATGCATTGGAATATCGATCGCGCGAATGTGCATA

AACGGACTGTCGCGACGCGCGATGTGTAGAGCTTGTTCTCGGATATACGCGATATTGTGAATA

ACGTTACGTTACGGATCGGATTCACGGCAATT

AACAAGCTTCGGAATAGTTACCGGATCGGTTAGG

CGAATTAGTGGCGAA

GGCCTTAAACGACGATGTATTCGAATATCGATCGCGCGAATGTGCATA

ACCGGACTGTCGCGACGCTCGATGTGTAGAGCTTGATCTAGGATATACGCGATATTGTGAATA

Page 23: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

ACCGTCGTGCTTACTACCGT

- - ACCGT - -- - - - CGTGCTTAC - - - - -- TACCGT - -

TTAC - - - - -- TACCGT - -- - ACCGT - -- - - - CGTGC

1 +1 +2 =___

4

TTACCGTGC

Page 24: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

TAGG AGGT CGTC GTCG

TAGGAGGT

1

TAGGAGGT 3 TAGG

AGGT

CGTC

GTCG

1

34 4

441

2

4

2

4

4

Page 25: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

1

34 4

441

2

4

2

4

4

TAGG

AGGT

CGTC

GTCG

CGTC

- GTCG

- - - - TAGG

- AGGT

CGTCGTAGGT

lunghezza 10

Page 26: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

1

34 4

441

2

4

2

4

4

TAGG

AGGT

CGTC

GTCG

TAGG

- AGGT

- - GTCG

- - CGTC

TAGGTCGTC

lunghezza 9

CGTCGTAGGT

Page 27: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

ALBERI FILOGENETICI

A B C D E F

Page 28: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

A

B

C

D

E

F

a b c d e

0

1

1

0

0

1 0

0

0

1

1

1

0

0

1

1

1

10

0

0

0

1

1

0

0

0

1

1

1

Page 29: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

00110

00010

00100 10010 00011

00101

10011

00100

10010

01011 00010

Page 30: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

1

0

1 0 0

1

0

1

0

0 0

Page 31: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

0

0

0 0 1

1

1

0

0

1 0

Page 32: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

1

0

1 0 0

1

0

1

0

0 0

Page 33: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

A

B

C

D

E

F

a b c d e

0

1

1

0

0

1 0

0

0

1

1

1

0

0

1

1

1

10

0

0

0

1

1

0

0

0

1

1

1

esiste un albero filogenetico perfetto con A,B,C,D,E,F nodi?

Page 34: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

2 foglie

3 foglie

4 foglie

A B

A

AB

BC CA BC

5 foglie

12 6 18

60 30 30 120

Page 35: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

a b c d e

A

B

C

D

E

F

0

1

1

0

0

1 0

0

0

1

1

1

0

0

1

1

1

10

0

0

0

1

1

0

0

0

1

1

1

caratteri ordinati: solo 0 --> 1 ammesso

problema facile

Page 36: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

A

B

C

D

E

F

a b c d e

0

1

1

0

0

1 0

0

0

0

1

0

0

0

0

0

0

10

0

0

1

0

1

0

0

0

0

1

1

Page 37: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

ABCDEF

a b c d e

011001 0

0

00

10

0

0

00

010

0

0

10

1

0

00

0

1

1

a b c d eE 0 0 00 0

C 1 1 01 0

B 1 0 00 0

F 1 0 01 0

D 0 0 00 1A 0 0 10 1

a

cb

d e

C F

B E

A D

Page 38: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

A

B

C

D

E

F

a b c d e

1

1

1

0

1

0 1

1

0

1

1

1

0

0

0

0

1

01

0

1

1

0

1

1

0

0

0

0

0

1

1

1

1

1

0

f

1

1

0

1

0

1

g

A

B

C

D

E

F

a b c d e

0

0

0

1

0

1 0

0

1

0

0

0

0

0

0

0

1

00

1

0

0

1

0

1

0

0

0

0

0

0

0

0

0

0

1

f

0

0

1

0

1

0

g

caratteri non ordinati (filogenia perfetta)

Page 39: A C G T A A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G

1001011

1101011 1001010

0101011 1101011

0100011

0101011

1101001

1101011

1001010 1001010

DF

A

C

E B