genetic semihomology algorithm

57
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra Q Q H H Y Y E E D D K K N N R R R R W C C G G G G R R S S P P P P S S S S A A A A T T T T L L L L L L F F V V V V I M I I Genetic semihomology algorithm A new approach to the comparative analysis of protein sequences It is admitted that from evolutionary point of view the genetic code and amino acid ‘language’ have evolved simultaneously. They also act with strict coherence with each other. Therefore, in analysis of protein differentiation and variability, both levels should be considered simultaneously.

Upload: trula

Post on 17-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Genetic semihomology algorithm. A new approach to the comparative analysis of protein sequences. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Q

Q

H

H

Y

Y

E

E

D

D

K

K

N

N

R

R

R

R

W

C

C

G

G

G

G

R

R

S

S

P

P

P

P

S

S

S

S

A

A

A

A

T

T

T

T

L

L

L

L

L

L

F

F

V

V

V

V

I

M

I

I

Genetic semihomology algorithm

A new approach to the comparative analysis of protein sequences

It is admitted that from evolutionary point of view the genetic code and amino acid ‘language’ have evolved simultaneously. They also act with strict coherence with each other. Therefore, in analysis of protein differentiation and variability, both levels should be considered simultaneously.

Page 2: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

The tools currently used for comparative sequence analysis

apply the Markovian model of amino acid replacement and are based on stochastic matrices of the observed amino acid substitution frequencies

Page 3: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

BLOSUM62 matrix of amino acid replacements

A 4 R -1 5 N -2 0 6 D -2 -2 1 6 C 0 -3 -3 -3 9 Q -1 1 0 0 -3 5 E -1 0 0 2 -4 2 5 G 0 -2 0 -1 -3 -2 -2 6 H -2 0 1 -1 -3 0 0 -2 8 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4

A R N D C Q E G H I L K M F P S T W Y V

Page 4: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

BLASTP 2.2.2 [Dec-14-2001]

BLAST protein search - output

Page 5: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Genetic semihomology algorithm

The aim of the new algorithm elaboration is to overcome the basic disadvantages of protein sequence analysis tools and to exclude some basic errors in the assumptions of the existing statistical methods.

It is to be able to explain the mechanism and pathway of protein evolution and differentiation, not only limited to the description of the initial and final step of the observed changes.

Page 6: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Genetic semihomology algorithm

Another goal of this algorithm is to make it applicable to any group of proteins of any nature, function and location. It can be achieved for two reasons:

1) minimization of basic assumptions limited to the general amino acid: codon translation table and assuming that single point mutation is a principle, most common, mechanism of protein variability;

2) non-statistical approach (no stochastic matrices)

Page 7: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Genetic semihomology algorithm

The algorithm of genetic semihomology assumes the close relation between the compared amino acids and their codons in related proteins. The algorithm is based on the network of genetic relationship between amino acids.Such assumption makes the same residues at different positions of the sequence unequal with respect to their changeability.

Page 8: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Genetic semihomology algorithm

The algorithm assumes that the basic mechanism of differentiation among related proteins consists in the single point mutation. The general part of the algorithm is the three-dimensional diagram reflecting the network of genetic relationship between amino acids

Page 9: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

AGCU 1

3 2

Q

Q

H

H

Y

Y

E

E

D

D

K

K

N

N

R

R

R

R

W

C

C

G

G

G

G

R

R

S

S

P

P

P

P

S

S

S

S

A

A

A

A

T

T

T

T

L

L

L

L

L

L

F

F

V

V

V

V

I

M

I

I

Diagram of amino acid genetic relationships CAA UAA GAA AAA

CAG UAG GAG AAG

CAC UAC GAC AAC

CAU UAU GAU AAU

CGA UGA GGA AGA

CGG UGG GGG AGG

CGC UGC GGC AGC

CGU UGU GGU AGU

CCA UCA GCA ACA

CCG UCG GCG ACG

CCC UCC GCC ACC

CCU UCU GCU ACU

CUA UUA GUA AUA

CUG UUG GUG AUG

CUC UUC GUC AUC

CUU UUU GUU AUU

Diagram of codon genetic relationships

Page 10: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Semihomology algorithmInput requirements

The minimum data required for starting analysis with the algorithm of genetic semihomology are the protein sequences (at least two). The more sequences are used for analysis and multiple alignment construction - the more concise and accurate the results are. Although the nucleotide sequences of the genes are not necessarily required, it is very helpful if the nucleotide sequences are known at least for some of the analyzed proteins. That increases significantly the amount of the information accessible from such analysis.

Also the results are the best for sequences revealing sufficiently high degree of diversity.

Page 11: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

GTT AAT TGC AGC CTG TAT GCC AGC GGC ATC GGC AAG GAT GGG ACG AGT TGG GTA GCC

1) V N C S L Y A S G I G K D G T S W V A ATT GAT TGC TCT CCG TAC CTC CAA GTT GTA AGA GAT GGT AAC ACC ATG GTA GCC

2) I D C S P Y L Q - V V R D G N T M V A V N C S L Y A S G I G K D G T S W V A % I D C S P Y D G N T M V A 0 0 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1 7/19 36.8 GTT AAT TGC AGC CTG TAT GCC AGC GGC ATC GGC AAG GAT GGG ACG AGT TGG GTA GCC ATT GAT TGC TCT CCG TAC CTC GTT GTA AGA GAT GGT AAC ACC ATG GTA GCC

2 2 3 0 2 2 1 0 0 1 1 1 3 2 1 1 1 3 3 29/57 50.9 V N C S L Y A S G I G K D G T S W V A 42/97 43.3 I D C S P Y L V V R D G N T M V A 42/89 47.2 1 1 2 2 0 2 0 0 0 1 0 1 2 2 1 1 0 2 2 20/38 52.6 V N C S L Y A S G I G K D G T S W V A I D C S P Y L V V R D G N T M V A 2 2 3 3 2 3 0 0 0 2 1 2 3 3 1 1 0 3 3 34/57 59.6

<L Q V V R>

< CAA >

< Q >

< Q >

UNITARY MATRIX

GENETIC CODE MATRIX

PAM250 SCORING

GENETIC SEMIHOMOLOGY

SCORE

Comparison of the fragments of 1st and 2nd domain of chicken ovomucoid using unitary matrix, GCM, PAM250 and algorithm of genetic semihomology

Page 12: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Semihomology algorithmAdvantages of the approach

The results obtained by using this method are more comprehensive than those of the methods used currently and reflect the actual mechanism of protein differentiation and evolution. They concern:

1) location of homologous and semihomologous sites in compared proteins,

2) precise estimation of gap location in non-identical fragments of different length,

3) analysis of internal homology and semihomology,

4) precise location of domains in multidomain proteins,

5) estimation of genetic code of non homologous fragments,

Page 13: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Semihomology algorithmAdvantages of the approach

6) construction of genetic probes,

7) studies on differentiation processes among related proteins,

8) estimation of the degree of relationship among related proteins,

9) studies on the evolution mechanism within homologous protein families,

10) confirmation of the actual relationship between sequences revealing low degree of identity/similarity.

Page 14: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Semihomology algorithmAdvantages of the approach

Application of the semihomology approach has led to discovery and describing some important mechanisms affecting the protein evolution and differentiation. The most important are:

1) the mechanism and role of cryptic mutations at unusually variable positions (Leluk; 2000b-c)

2) the phenomenon of very long distance (dispersed) mutational correlation within sets of variable positions (Leluk 2000a; Leluk and Grabiec, 2001; Leluk et al., 2001b; Leluk et al., 2002)

Page 15: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Semihomology algorithmLimitations of the approach

The limitations in use of the semihomology approach appear in case if:

- there are too few sequences taken for analysis

- the identity degree among the sequences is too high (too low diversity)

- the long fragments of the compared sequences show no identity at all (e.g. N-terminal signal fragments of homologous proteins)

Page 16: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

The analysis of genetic semihomology excludes applicability of Markov model for the studies on protein

variability at the amino acid level.

The amino acid codons do contain the information about the „ancestral” amino acids, whose codons were the

starting point to the codon of current residue.

It refers mainly to the positions undergoing single-point mutations as the most basic mechanism of evolutionary

variability.

Page 17: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

GEISHA- Protein sequence similarity and homology analysis.- Multiple alignment construction on the basis of genetic relationships between amino-acids.- Analysis of variability within homologous protein families- Molecular phylogenetic studies

Software based on genetic semihomology algorithm

Page 18: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

FQS Semihomology tool http://www.fqspl.com.pl/sh/

Software based on genetic semihomology algorithm

Page 19: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

The semihomologous correlation between amino acids occurring at selected non-homologous positions of

inhibitors from squash seeds.

The solid lines indicate the transition type of semihomology, the dashed lines refer to transversion type of semihomology.

K

Q

G

E D

S

D

Q E

H

N

Y

W L A D S R

S R D A

or

25. NHEQDS 19. DEGQK 14. RSDA 7. LYW

Page 20: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

18. VILE 19. NDH 20. C 21. [STR][D] 22. LPKQE 23. YF 24. ALPKQ 25. SQTK 26. GTRS– 27. IVKNT 28. GVSTL 29. KRTQ– 30. DGN– 31. G– 32. TNRKE– 33. STLAQP 34. WMLIV–

1. VTI 2. [A][R]– 3. C 4. PT 5. [RM][F] 6. [NI][E] 7. [L][Y] 8. KSLQDV 9. [P][E] 10. [V][H] 11. C 12. GA 13. TS 14. DN 15. GS 16. SFV 17. T

1. Y 2. SDA 3. NS 4. [ED][R] 5. C 6. GSTF 7. ILF 8. C 9. [L][A][N] 10. [YH][A] 11. NY 12. RAILV 13. EQ 14. HQLS 15. GHRN 16. ATR 17. [NHST][E]

35. VIL 36. ESKAGN 37. [K][L] 38. ELKSRV 39. [YHS][K] 40. [DN][M] 41. GA 42. EKRA 43. C 44. RKE 45. PLQE 46. KERD 47. [ISV][H] 48. [VG][PT] 49. [MEK][PS]

Application of genetic semihomology algorithm for identifying the fragments of possible different mechanism than single

point mutation

Page 21: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Dot matrix pairwise alignment

Noise reduction

Page 22: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Dot matrix pairwise alignment

Internal homology (gene multiplication)

Chicken ovoinhibitor precursor

(7 domains)

Chicken ovomucoid precursor

(3 domains)

BLAST 2 SEQUENCES SEMIHOM

Page 23: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Dot matrix comparison of selected homologous Kazal inhibitors

ovoinhibitor-ovoinhibitor ovoinhibitor-ovomucoid ovoinhibitor-PSTI

BLASTP 2.0.9 (Blosum62)

SEMIHOM (algorithm of genetic semihomology)

Page 24: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Consecutive steps of dot matrix results obtained from comparison of chicken ovoinhibitor with itself by program SEMIHOM

Raw dot matrix

Noise filtering

adding semihomology

marking of whole fragments

Page 25: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Precise gap location in non-identical fragments of different length by using the genetic semihomology algorithm. The compared proteins are trypsin inhibitors

from squash seeds - CPGTI-I (horizontal) and CSTI-IIb (vertical).

CPGTI-I has a non homologous His25 while CSTI-IIb possesses at relative site two non homologous residues Asp25 and Ile26.

identities only

?RVCPKILMECKKDSDCLAECICLEH-GYCGMVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS

?RVCPKILMECKKDSDCLAECICLE-HGYCGMVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS

identities and semihomology

The analysis of semihomology shows the genetic relationship between His25 and Asp25, therefore a gap in CPGTI-I should be located next to Ile26 in CSTI-IIb. Window size = 10; minimum number of homologous positions in window = 4.

RVCPKILMECKKDSDCLAECICLEH-GYCGMVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS

Page 26: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Proteinase inhibitors from squash seeds

Acc.number

sp|P01074 R V C P R I L M E C K K D S D C L A E C V C L E H - G Y C G sp|P07853 R V C P R I L M K C K K D S D C L A E C V C L E H - G Y C G sp|P07853 H E E R V C P R I L M K C K K D S D C L A E C V C L E H - G Y C G sp|P10293 R V C P K I L M E C K K D S D C L A E C I C L E H - G Y C G sp|P10293 R V C P K I L M E C K K D S D C L A E C I C L E H - G Y C G sp|P10293 H E E R V C P K I L M E C K K D S D C L A E C I C L E H - G Y C G sp|P10291 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S sp|P10292 M M C P R I L M K C K H D S D C L P G C V C L E H I E Y C G sp|P11969 G R R C P R I Y M E C K R D A D C L A D C V C L Q H - G I C G sp|P11968 R G C P R I L M R C K R D S D C L A G C V C Q K N - G Y C G sp|P17680 G I C P R I L M E C K R D S D C L A Q C V C K R Q - G Y C G sp|P12071 G C P R I L M R C K Q D S D C L A G C V C G P N - G F C G sp|P10294 < E R R C P R I L K Q C K R D S D C P G E C I C M A H - G F C G

Multiple alignment

Page 27: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Multiple alignment of seven chicken

ovoinhibitor domains obtained with

Markovian and non-Markovian methods

Page 28: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Consensus alignment results for Bowman Birk inhibitors   1 10 20 30 40 50 60 ClustalW .. ...**: * ** * ** * * *: ::*** *. * *: * * :* * * .*** *:Multalin .d....aCC#.C.CTkS.PPqCrC.Dir.#tCHSaCksCiCtrS.PpqCrC.Dtt.FCYk.C.GS AA S.s.SSS**S.*S**s*S**.*S*S*SS*.S***S*.S*s*Ss*s*SS*s*s*SSS***ss*SGS NA sss.SSSS*S.*.**.*SS*sSs*s*SS*sSSS*SSssSsSSsSsSSSS.SsSSsS*SSs.Ss  

Consensus alignment results for squash inhibitors   1 10 20 30ClustalW ** * * : ** * * **Multalin ....r.CPrIlm.Ck.DsDCla.C.Cl...G.CG..GS AA SS**S*sSS*SsSs**SSS*S*SSS.SS** Consensus alignment results for ovoinhibitor domains (Kazal type)  1 10 20 30 40 50 60ClustalW .* : *. *.::. ** . * :* : : . *Multalin .dcs.y......dg...vaCp.il.pvCgt#gvTYsneC..Cahn.#..t...k..dg.C......GS AA SS*sSSSSSSSSS*SSSS.*Sss..ss*sSSSS**sSs*sS*.sSSSsSSsSS.sssSS*SSSsssGS NA sSSs.S.S....sSSssss*Sss..ssssSSSSSSsSsS.sS.sS.S...sSSs..sSSSssss.s

Comparison of multiple alignment consensus results for three inhibitor families achieved with ClustalW, Multalin (both

applied BLOSUM62) and algorithm of genetic semihomology

Page 29: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

-spectrin

-s

pec

trin

500 1000 1500 2000

500

1000

1500

2000

The dot matrix comparison of human erythrocyte alpha-spectrin with itself

For the best visualization of the repeats, the identity threshold is set as 15, and frame size as 75

Page 30: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

The multiple alignment of proteins homologous to the 10th segment of human erythrocyte -spectrin

Page 31: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

The multiple alignment of human erythrocyte α-spectrin 23 segments achieved with the MultAlin (BLOSUM62) method

and genetic semihomology approach

Page 32: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

-s

pec

trin

rep

eats

con

sen

sus

(Sah

r et

al) 500 1000 1500 2000

20

-spectrin

60

100

-spectrin

500 1000 1500 2000

-s

pec

trin

rep

eats

con

sen

sus

(g

enet

ic s

emih

omol

ogy)

20

60

100

The dot matrix comparison of human erythrocyte α-spectrin with the consensus of 106-residue repeats

The consensus described by Sahr et al. [1990]. The identity threshold and frame

size are set as 8 and 40 respectively.

The consensus achieved with genetic semihomology approach [Leluk, 1998]. The identity threshold and frame size are set as

8 and 40 respectively. The identity and genetic semihomology of the compared

residues is visualized

Page 33: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Query sequence: α-Spectrin segment consensus sequence obtained with MultAlin program (BLOSUM62)

Score E-valueSequences producing significant alignments: (bits) sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE 17 21352sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,... 16 27970sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE 16 27970sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 16 27970sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... 16 27970sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... 16 27970sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N... 16 36639sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN 16 47996sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN 15 62874sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 14 141334sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE 14 185142sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE 14 185142sp|P39254|Y04O_BPT4 HYPOTHETICAL 36.3 KD PROTEIN IN NRDC-M... 11 935541

The use of α-spectrin consensus sequence achieved with different algorithms for sequence similarities search (BLAST)

Page 34: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Query sequence: α-Spectrin segment consensus sequence attained by Sahr et al. [1990]

Score E-valueSequences producing significant alignments: (bits) sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN 43 2e-04sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... 40 0.001sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 39 0.003sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 39 0.004sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N... 37 0.012sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE 37 0.016sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE 36 0.021sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE 35 0.048sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN 35 0.063sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,... 34 0.082sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... 34 0.11sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE 32 0.32sp|P05095|AACT_DICDI ALPHA-ACTININ 3, NON MUSCULAR (F-ACTIN... 21 797sp|P34367|YLJ2_CAEEL HYPOTHETICAL 256.3 KD PROTEIN C50C3.2 ... 20 1791sp|Q03001|BPA1_HUMAN BULLOUS PEMPHIGOID ANTIGEN 1 (BPA) (H... 19 3073sp|P30427|PLEC_RAT PLECTIN 19 4026sp|P31670|GT27_FASHE GLUTATHIONE S-TRANSFERASE 26 KD 47 (GS... 19 4026sp|P46125|YEDI_ECOLI HYPOTHETICAL 32.2 KD PROTEIN IN DSRB-V... 18 6908sp|P42094|PHYT_BACSU 3-PHYTASE PRECURSOR (PHYTATE 3-PHOSPHA... 18 6908sp|P56288|E2BG_SCHPO PROBABLE TRANSLATION INITIATION FACTOR... 18 9049sp|O00273|DFFA_HUMAN DNA FRAGMENTATION FACTOR ALPHA SUBUNI... 18 9049sp|P12311|ADH1_BACST ALCOHOL DEHYDROGENASE (ADH-T) 18 9049

The use of α-spectrin consensus sequence achieved with different algorithms for sequence similarities search (BLAST)

Page 35: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Query sequence: α-Spectrin segment consensus sequence attained by genetic semihomology algorithm

Score E-valueSequences producing significant alignments: (bits) sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN 49 3e-06sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... 47 1e-05sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 42 3e-04sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N... 41 6e-04sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE 38 0.005sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN 37 0.014sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE 36 0.019sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE 36 0.024sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 36 0.032sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,... 34 0.094sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... 34 0.094sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE 33 0.21sp|P34367|YLJ2_CAEEL HYPOTHETICAL 256.3 KD PROTEIN C50C3.2 ... 20 1195sp|Q99001|AACB_CHICK ALPHA-ACTININ, BRAIN ISOFORM (F-ACTIN ... 20 1566sp|P35609|AAC2_HUMAN ALPHA-ACTININ 2, SKELETAL MUSCLE ISOF... 20 1566sp|P12814|AAC1_HUMAN ALPHA-ACTININ 1, CYTOSKELETAL ISOFORM ... 20 1566sp|P05094|AACT_CHICK ALPHA-ACTININ, SMOOTH MUSCLE ISOFORM (... 20 1566sp|P30427|PLEC_RAT PLECTIN 19 2687sp|P20111|AACS_CHICK ALPHA-ACTININ, SKELETAL MUSCLE ISOFORM... 19 3520sp|P47493|SYG_MYCGE GLYCYL-TRNA SYNTHETASE (GLYCINE--TRNA L... 18 4611sp|Q03001|BPA1_HUMAN BULLOUS PEMPHIGOID ANTIGEN 1 (BPA) (H... 18 7913

The use of α-spectrin consensus sequence achieved with different algorithms for sequence similarities search (BLAST)

Page 36: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Kinase projectMultiple alignment of hexokinase domains

HEXOKINASE_I (2 domains) HEXOKINASE_II (2 domains)HEXOKINASE_III (2 domains)HEXOKINASE_IV (1 domain)

Page 37: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Kinase project

Complex comparative studies at the primary structure level

Construction of molecular phylogenetic trees

Studies on sequence/structure/function relationship

Studies on the mechanisms of correlated mutations and variability

Genetic principles of differentiation within this protein family

Page 38: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

T

IM

V

RS A

KN G P FL

L

DE R S

HQ CW

Y

Simplified (planar) diagram of genetic relationships between amino acids

In planar diagram the encoding role of the third codon position is ignored.

Only first two codon positions are taken into account.

Page 39: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

T

IM

V

RS A

KN G P FL

L

DE R S

HQ CW

Y

Simplified (planar) diagram of genetic relationships between amino acids

The simplified planar diagram emphasizes the special encoding character of six-codon amino acids – Leu, Arg and Ser.

The six-codon amino acids may play the role the of „mutational passages” that are not liable to the selection restrictions.

These amino acids may influence on the variability range increase.

In fact the six-codon amino acids occur unusually frequent at very variable positions. This concerns especially serine, and to lesser extent – arginine.

Leucine does not show the correlation between the frequency of occurrence and variability range.

Conclusions?

T

IM

V

RS A

KN G P FL

L

DE R S

HQ CW

Y

Page 40: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Frequency of six-codon amino acids as a function of position variability in randomly selected proteins of different origin and

nature

The results for 2686 residues at 606 corresponding positions

ALL SEQUENCES(2686 residues at 606 positions)

(discrete data)

0

20

40

60

80

100

1 2 3 4 5 6 7 8

Number of residues occurring at aligned position

% o

f occ

urre

nce

Ser

Arg

Leu

Page 41: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

I II III IV V VI VII I 61 64 54 54 53 45 II 53 66 65 57 54 45 III 56 64 66 59 66 42 IV 47 56 61 55 57 39 V 44 44 56 50 58 42 VI 38 41 52 42 42 40 VII 25 28 26 25 26 29

I II III IV V VI VII I 52 57 39 45 42 33 II 72 64 59 66 65 55 III 75 81 56 51 62 33 IV 68 71 77 49 45 30 V 67 66 76 68 46 31 VI 60 65 75 68 70 38 VII 52 55 53 52 52 56

The ovoinhibitor domains homology comparison. The similarity scores (%) for the aligned DNA-coding sequences are shown above the diagonal. Below diagonal are the similarity scores for the aligned amino acid sequences.

The results obtained by Scott et al.

The results obtained by semihomology analysis

Phylogenetic tree verification

Scott M. J., Huckaby C. S., Kato I., Kohr W. J., Laskowski M. Jr, Tsai M.-J. and O'Malley B. W. (1987) J. Biol. Chem. 262, 5899-5907

I VII II III IV V VI I II III PSTI OI OI OM OM

VII III IV I V VI I II III PSTI OI OI OM OM

II

Page 42: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Arg codons: AGA AGG CGA CGG CGC CGT

Met codons: ATG

Virtual reverse translation

Deducing the genetic code

QRCRRDSDCKKCRMDSDC

Page 43: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Arg codons: AGA AGG CGA CGG CGC CGT

Met codons: ATG

Virtual reverse translation

Deducing the genetic code

QRCRRDSDCKKCRMDSDC

Page 44: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

Q

Q

H

H

Y

Y

E

E

D

D

K

K

N

N

R

R

R

R

W

C

C

G

G

G

G

R

R

S

S

P

P

P

P

S

S

S

S

A

A

A

A

T

T

T

T

L

L

L

L

L

L

F

F

V

V

V

V

I

M

I

I

AGCU 1

3 2

Virtual reverse translation

Page 45: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

R

R

R

R

R

R

M

Q

Q

H

H

Y

Y

E

E

D

D

K

K

N

N

W

C

C

G

G

G

G

S

S

P

P

P

P

S

S

S

S

A

A

A

A

T

T

T

T

L

L

L

L

L

L

F

F

V

V

V

V

I

I

I

AGCT 1

3 2

Virtual reverse translation

Page 46: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

CGA

CGG

CGC

CGT

AGA

AGG

ATG

Q

Q

H

H

Y

Y

E

E

D

D

K

K

N

N

W

C

C

G

G

G

G

S

S

P

P

P

P

S

S

S

S

A

A

A

A

T

T

T

T

L

L

L

L

L

L

F

F

V

V

V

V

I

I

I

AGCT 1

3 2

Virtual reverse translation

Page 47: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

CGA

CGG

CGC

CGT

AGA

Q

Q

H

H

Y

Y

E

E

D

D

K

K

N

N

W

C

C

G

G

G

G

S

S

P

P

P

P

S

S

S

S

A

A

A

A

T

T

T

T

L

L

L

L

L

L

F

F

V

V

V

V

I

I

I

AGCT 1

3 2

ATG

AGG

Virtual reverse translation

Page 48: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

H

H

Y

Y

E

E

D

D

K

K

N

N

R

R

W

C

C

G

G

G

G

R

S

S

P

P

P

P

S

S

S

S

A

A

A

A

T

T

T

T

L

L

L

L

L

L

F

F

V

V

V

V

I

I

I

AGCU 1

3 2

Genetic relationships between Arg and Met/Gln

M

R

R

Q

R

Q

Page 49: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

1 4 5 7 8 12 13 15 22 33 37 49 50 59 61 62 63 66 I V S L A S K D T R S N G A E R P K M II I S P L Q R D N R F N H T E K L E S III V S K P S K D R R F N R T K R Q E E IV I D Q P T T G K R F N G T R K E R P V L T Q L S Q N E F V N G T R R E E E VI L S K K T K D R M V S R T R E E D K VII E R E Q K - - - M V N N R A ATA AGY CCR GCR ACR ACR GGY AGR AGG GTY AGY GGY AGR AGR AGR CCR AGR AAG GTA CGY CTR CCR TCR TCY CGY ACR GCR CTR GAG GAA CTR GCR YTA Prediction accuracy = 69.7%

Genetic semihomology prediction of genetic code for chicken ovoinhibitor domains.

The predicted gene sequence is compared with the known DNA sequence. Only those positions where prediction reduces possible codons are considered. The predicted codons that are consistent with those present in the ovoinhibitor gene are shadowed.

Page 50: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

References (1)

Leluk, J., Pham, T.-C. (1985) Calculation and analysis of protein sequence homology using microcomputer ZX Spectrum. VIIth Polish Conferrence "Chemistry of Amino Acids and Peptides", Abstracts, 83

Kubiak, Z.J., Leluk, J. (1986) Application of the ZX Spectrum microcomputer for prediction of the secondary structure of proteins. Pr. Nauk. Inst. Chem. Nieorg. Metal. Pierwiastków Rzadkich. Wrocław 55, 186-189

Leluk, J. (1993) Analysis of protein primary structure using program HOMOLOGY. XXIXth Meeting of Polish Biochemical Society, Abstracts, 384

Leluk, J., Krowarsch, D. (1994) A new algorithm for analysis the homology in protein primary structure, 3rd Conferrence "Computers in Chemistry", Wrocław, Poland, Abstracts, 60.

Leluk, J. (1994) Application of program SEMIHOM based on a new algorithm for homology analysis of protein primary structure. Ist Conference "Computer-based Scientific Research", Wrocław, Poland, Abstracts, 189-194. .

Leluk, J. (1996) Comparison of the algorithm of genetic semihomology with currently applied algorithms for protein homology analysis. IIIrd Conferrence "Computer-based Scientific Research", Wrocław, Polanica-Zdrój, Abstracts, 53-58.

Page 51: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

References (2)

Leluk, J., Kuźnicki, T. (1996) Analysis of homology of selected proteinase inhibitor families using programs HOMOLOGY and SEMIHOM. IVth Conferrence "Computers in Chemistry '96", Polanica-Zdrój, Abstracts, 37.

Leluk, J. (1998) A new algorithm for analysis of the homology in protein primary structure. Computers & Chemistry, 22, 123-131.

Leluk, J. (1999a) Mutational variability in proteins and the Markovian model of replacement. 5th International Conference "Computers in Chemistry '99", Szklarska Poręba, Poland, Abstracts, P47.

Leluk, J. (1999b) Studies on mutational regularities in selected proteinase inhibitory families. 1st Polish Congress of Biotechnology, Wrocław, Poland, Abstracts (lectures), 183-184.

Leluk, J. (2000a) Serine proteinase inhibitor family in squash seeds: Mutational variability mechanism and correlation. Cell. Mol. Biol. Lett., 5, 91-106.

Leluk, J. (2000b) A non-statistical approach to protein mutational variability. BioSystems, 56, 83-93.

Leluk, J. (2000c) Regularities in mutational variability in selected protein families and the Markovian model of amino acid replacement. Computers & Chemistry, 24, 659-672.

Page 52: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

References (3)

Leluk, J., Grabiec, M., Sobczyk, M. (2000) The application of genetic semihomology algorithm for theoretical studies on various protein families. International Conference on Conformation of Peptides, Proteins and Nulceic Acids, August 29 - September 2, 2000, Debrzyno, Poland - Abstracts, p. 27.

Leluk, J., Hanus-Lorenz, B. Sikorski, A.F. (2001a) Application of genetic semihomology algorithm to theoretical studies on various protein families. Acta Biochim. Polon., 48, 21-33.

Leluk, J. (2001a) An introduction to theoretical studies on protein primary structure. International Workshop “Information Theory Days” Warsaw, April 23-29 2001.

Leluk, J. (2001b) Benefits from theoretical comparative studies on diverse protein families. International Workshop “Information Theory Days” Warsaw, April 23-29 2001.

Leluk, J. and Grabiec, M. (2001) Sequence similarity estimation and correlated mutations in selected protein families. I. An approach to protein sequence similarity estimation. Ist Summer School on “Parallel Computing in Biomolecular Simulations”, September 1-3 2001, Gdańsk, Poland; Abstracts L-5.

Hanus-Lorenz, B., Hryniewicz-Jankowska, A., Leluk, J., Lorenz, M., Skała, J. and Sikorski, A.F. (2001) Spectrin motifs are detected in plant and yeast genomes, 8th International W. Mejbaum-Katzenellenbogen's Seminar on Membrane Skeleton and Its Regulatory Functions, Szklarska Poręba 2001, Cell. Mol. Biol. Lett., 6, 207.

Page 53: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

References (4)

Leluk, J., Sobczyk, M. and Becella, Ł. (2001b) Sequence similarity estimation and correlated mutations in selected protein families. II. Correlated mutations in selected protein families. Ist Summer School on “Parallel Computing in Biomolecular Simulations”, September 1-3 2001, Gdańsk, Poland; Abstracts L-5.

Leluk, J., Sobczyk, M. and Becella, Ł. (2002) Correlated mutations in selected protein families, TASK Quarterly, , 6(3), 469-482.

Leluk, J., Konieczny, L. and Roterman, I. (2003) Search for structural similarity in proteins, Bioinformatics, 19(1), 117-124.

Barbara Bereza, Agnieszka Kubiak, Jacek Leluk, Wacław Hendrich (2003) Photoenzyme Protochlorophyllide-NADPH oxidoreductase (LPOR) – the key for chlorophyll biosynthesis, Postępy Biochemii, 49, 46-55.

Jacek Kuśka, Jacek Leluk (2003) Theoretical studies on sugar and protein kinases: Structural and mutational variability, 3rd International Conference “Inhibitors of Protein Kinases”, Cell. Mol. Biol. Lett., 8, 592

Anna Zdyb, Jacek Leluk (2003) Comparative studies on sugar and protein kinases: Genetic relationships and consensus sequences, 3rd International Conference “Inhibitors of Protein Kinases”, Cell. Mol. Biol. Lett., 8, 593

Page 54: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

References (5)

Jacek Leluk, Bogdan Lesyng (2003) A comparative study of the primary structures of protein and sugar kinases, 3rd International Conference “Inhibitors of Protein Kinases”, Cell. Mol. Biol. Lett., 8, 620-621

A. Czogalla, P. Kwołek, A. Hryniewicz-Jankowska, M. Nietubyć, J. Leluk, A.F. Sikorski (2003) A protein isolated from Escherichia coli, identified as GroEL, reacts with anti-beta spectrin antibodies, Arch Biochem Biophys., 415, 94-100.

M. Kukuła, B. Hanus-Lorenz, E. Bok, J. Leluk, and A. F. Sikorski (2004) Proteins with Spectrin Motifs Which Do Not Belong to the Spectrin-a-Actinin-Dystrophin Family, Z. Naturforsch., 59c, 565-571

Jacek Leluk (2002) Studies on length-dependent estimation of sequence identity significance, XXXVIIIth Meeting of Polish Biochemical Society, Abstracts, 198-199.

Anna Zdyb, Jacek Leluk (2002) Theoretical studies on homology among kinase family, XXXVIIIth Meeting of Polish Biochemical Society, Abstracts, 215.

Jacek Kuśka, Jacek Leluk (2002) Contact-variability-structure relationship within the kinase molecule, XXXVIIIth Meeting of Polish Biochemical Society, Abstracts, 215.

Page 55: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

References (6)

Jacek Kuśka, Jacek Leluk (2003) Studies on kinase structure and mutational variability, 2nd Polish Congress of Biotechnology, Łódź, Poland 23-27.06.2003, Abstracts, 75.

Anna Zdyb, Jacek Leluk (2003) Kinases – Comparative analysis, genetic relationships and consensus sequence, 2nd Polish Congress of Biotechnology, Łódź, Poland 23-27.06.2003, Abstracts, 75.

Jacek Leluk, Bogdan Lesyng (2003) Comparative analysis of homologous kinases, 2nd Polish Congress of Biotechnology, Łódź, Poland 23-27.06.2003, Abstracts, 235.

Jacek Leluk (2003) Wrong assumptions and misinterpretations in molecular biology, biochemistry and bioinformatics, Gliwice Scientific Meetings 2003, Gliwice, Poland, Abstracts, 16

Anna Fogtman, Jacek Leluk, Bogdan Lesyng (2004) Construction of consensus sequences of the β-spectrin family with variable parameter-thresholds and validation of the applied procedure, 29th FEBS Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 29-30.

Page 56: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

References (7)

Adam Górecki, Jacek Leluk, Bogdan Lesyng (2004) A Java-implementation of a genetic semihomology algorithm (GEISHA), and its applications for analyses of selected protein families, 29th FEBS Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 30.

Jacek Leluk, Artur Mikołajczyk (2004) A new approach to sequence comparison and similarity estimation, 29th FEBS Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 29.

Agata Meglicz, Jacek Leluk, Bogdan Lesyng (2004) Protein inhibitors of kinases – homology analysis, mechanisms of differentiation and correlated mutations, 29th FEBS Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 30.

Anna Fogtman, Jacek Leluk, Bogdan Lesyng (2005) β-Spectrin consensus sequence construction with variable threshold parameters; verification of their usefulness; The International Conference Sequence-Structure-Function Relationships; Theoretical and Experimental Approaches, Warsaw, Poland, April 6-10 2005, Abstracts, 12.

Page 57: Genetic semihomology algorithm

Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra

References (8)

Jacek Kuśka, Jacek Leluk, Bogdan Lesyng (2005) The variability patterns in kinase subfamilies; The International Conference Sequence-Structure-Function Relationships; Theoretical and Experimental Approaches, Warsaw, Poland, April 6-10 2005, Abstracts, 23.

Elżbieta Gajewska, Jacek Leluk (2005) An approach to sequence similarity significance estimation; Theoretical and Experimental Approaches, Warsaw, Poland, April 6-10 2005, Abstracts, 14.