genetic semihomology algorithm
DESCRIPTION
Genetic semihomology algorithm. A new approach to the comparative analysis of protein sequences. - PowerPoint PPT PresentationTRANSCRIPT
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Q
Q
H
H
–
–
Y
Y
E
E
D
D
K
K
N
N
R
R
R
R
–
W
C
C
G
G
G
G
R
R
S
S
P
P
P
P
S
S
S
S
A
A
A
A
T
T
T
T
L
L
L
L
L
L
F
F
V
V
V
V
I
M
I
I
Genetic semihomology algorithm
A new approach to the comparative analysis of protein sequences
It is admitted that from evolutionary point of view the genetic code and amino acid ‘language’ have evolved simultaneously. They also act with strict coherence with each other. Therefore, in analysis of protein differentiation and variability, both levels should be considered simultaneously.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The tools currently used for comparative sequence analysis
apply the Markovian model of amino acid replacement and are based on stochastic matrices of the observed amino acid substitution frequencies
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
BLOSUM62 matrix of amino acid replacements
A 4 R -1 5 N -2 0 6 D -2 -2 1 6 C 0 -3 -3 -3 9 Q -1 1 0 0 -3 5 E -1 0 0 2 -4 2 5 G 0 -2 0 -1 -3 -2 -2 6 H -2 0 1 -1 -3 0 0 -2 8 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A R N D C Q E G H I L K M F P S T W Y V
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
BLASTP 2.2.2 [Dec-14-2001]
BLAST protein search - output
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Genetic semihomology algorithm
The aim of the new algorithm elaboration is to overcome the basic disadvantages of protein sequence analysis tools and to exclude some basic errors in the assumptions of the existing statistical methods.
It is to be able to explain the mechanism and pathway of protein evolution and differentiation, not only limited to the description of the initial and final step of the observed changes.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Genetic semihomology algorithm
Another goal of this algorithm is to make it applicable to any group of proteins of any nature, function and location. It can be achieved for two reasons:
1) minimization of basic assumptions limited to the general amino acid: codon translation table and assuming that single point mutation is a principle, most common, mechanism of protein variability;
2) non-statistical approach (no stochastic matrices)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Genetic semihomology algorithm
The algorithm of genetic semihomology assumes the close relation between the compared amino acids and their codons in related proteins. The algorithm is based on the network of genetic relationship between amino acids.Such assumption makes the same residues at different positions of the sequence unequal with respect to their changeability.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Genetic semihomology algorithm
The algorithm assumes that the basic mechanism of differentiation among related proteins consists in the single point mutation. The general part of the algorithm is the three-dimensional diagram reflecting the network of genetic relationship between amino acids
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
AGCU 1
3 2
Q
Q
H
H
–
–
Y
Y
E
E
D
D
K
K
N
N
R
R
R
R
–
W
C
C
G
G
G
G
R
R
S
S
P
P
P
P
S
S
S
S
A
A
A
A
T
T
T
T
L
L
L
L
L
L
F
F
V
V
V
V
I
M
I
I
Diagram of amino acid genetic relationships CAA UAA GAA AAA
CAG UAG GAG AAG
CAC UAC GAC AAC
CAU UAU GAU AAU
CGA UGA GGA AGA
CGG UGG GGG AGG
CGC UGC GGC AGC
CGU UGU GGU AGU
CCA UCA GCA ACA
CCG UCG GCG ACG
CCC UCC GCC ACC
CCU UCU GCU ACU
CUA UUA GUA AUA
CUG UUG GUG AUG
CUC UUC GUC AUC
CUU UUU GUU AUU
Diagram of codon genetic relationships
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Semihomology algorithmInput requirements
The minimum data required for starting analysis with the algorithm of genetic semihomology are the protein sequences (at least two). The more sequences are used for analysis and multiple alignment construction - the more concise and accurate the results are. Although the nucleotide sequences of the genes are not necessarily required, it is very helpful if the nucleotide sequences are known at least for some of the analyzed proteins. That increases significantly the amount of the information accessible from such analysis.
Also the results are the best for sequences revealing sufficiently high degree of diversity.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
GTT AAT TGC AGC CTG TAT GCC AGC GGC ATC GGC AAG GAT GGG ACG AGT TGG GTA GCC
1) V N C S L Y A S G I G K D G T S W V A ATT GAT TGC TCT CCG TAC CTC CAA GTT GTA AGA GAT GGT AAC ACC ATG GTA GCC
2) I D C S P Y L Q - V V R D G N T M V A V N C S L Y A S G I G K D G T S W V A % I D C S P Y D G N T M V A 0 0 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1 7/19 36.8 GTT AAT TGC AGC CTG TAT GCC AGC GGC ATC GGC AAG GAT GGG ACG AGT TGG GTA GCC ATT GAT TGC TCT CCG TAC CTC GTT GTA AGA GAT GGT AAC ACC ATG GTA GCC
2 2 3 0 2 2 1 0 0 1 1 1 3 2 1 1 1 3 3 29/57 50.9 V N C S L Y A S G I G K D G T S W V A 42/97 43.3 I D C S P Y L V V R D G N T M V A 42/89 47.2 1 1 2 2 0 2 0 0 0 1 0 1 2 2 1 1 0 2 2 20/38 52.6 V N C S L Y A S G I G K D G T S W V A I D C S P Y L V V R D G N T M V A 2 2 3 3 2 3 0 0 0 2 1 2 3 3 1 1 0 3 3 34/57 59.6
<L Q V V R>
< CAA >
< Q >
< Q >
UNITARY MATRIX
GENETIC CODE MATRIX
PAM250 SCORING
GENETIC SEMIHOMOLOGY
SCORE
Comparison of the fragments of 1st and 2nd domain of chicken ovomucoid using unitary matrix, GCM, PAM250 and algorithm of genetic semihomology
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Semihomology algorithmAdvantages of the approach
The results obtained by using this method are more comprehensive than those of the methods used currently and reflect the actual mechanism of protein differentiation and evolution. They concern:
1) location of homologous and semihomologous sites in compared proteins,
2) precise estimation of gap location in non-identical fragments of different length,
3) analysis of internal homology and semihomology,
4) precise location of domains in multidomain proteins,
5) estimation of genetic code of non homologous fragments,
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Semihomology algorithmAdvantages of the approach
6) construction of genetic probes,
7) studies on differentiation processes among related proteins,
8) estimation of the degree of relationship among related proteins,
9) studies on the evolution mechanism within homologous protein families,
10) confirmation of the actual relationship between sequences revealing low degree of identity/similarity.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Semihomology algorithmAdvantages of the approach
Application of the semihomology approach has led to discovery and describing some important mechanisms affecting the protein evolution and differentiation. The most important are:
1) the mechanism and role of cryptic mutations at unusually variable positions (Leluk; 2000b-c)
2) the phenomenon of very long distance (dispersed) mutational correlation within sets of variable positions (Leluk 2000a; Leluk and Grabiec, 2001; Leluk et al., 2001b; Leluk et al., 2002)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Semihomology algorithmLimitations of the approach
The limitations in use of the semihomology approach appear in case if:
- there are too few sequences taken for analysis
- the identity degree among the sequences is too high (too low diversity)
- the long fragments of the compared sequences show no identity at all (e.g. N-terminal signal fragments of homologous proteins)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The analysis of genetic semihomology excludes applicability of Markov model for the studies on protein
variability at the amino acid level.
The amino acid codons do contain the information about the „ancestral” amino acids, whose codons were the
starting point to the codon of current residue.
It refers mainly to the positions undergoing single-point mutations as the most basic mechanism of evolutionary
variability.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
GEISHA- Protein sequence similarity and homology analysis.- Multiple alignment construction on the basis of genetic relationships between amino-acids.- Analysis of variability within homologous protein families- Molecular phylogenetic studies
Software based on genetic semihomology algorithm
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
FQS Semihomology tool http://www.fqspl.com.pl/sh/
Software based on genetic semihomology algorithm
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The semihomologous correlation between amino acids occurring at selected non-homologous positions of
inhibitors from squash seeds.
The solid lines indicate the transition type of semihomology, the dashed lines refer to transversion type of semihomology.
K
Q
G
E D
S
D
Q E
H
N
Y
W L A D S R
S R D A
or
25. NHEQDS 19. DEGQK 14. RSDA 7. LYW
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
18. VILE 19. NDH 20. C 21. [STR][D] 22. LPKQE 23. YF 24. ALPKQ 25. SQTK 26. GTRS– 27. IVKNT 28. GVSTL 29. KRTQ– 30. DGN– 31. G– 32. TNRKE– 33. STLAQP 34. WMLIV–
1. VTI 2. [A][R]– 3. C 4. PT 5. [RM][F] 6. [NI][E] 7. [L][Y] 8. KSLQDV 9. [P][E] 10. [V][H] 11. C 12. GA 13. TS 14. DN 15. GS 16. SFV 17. T
1. Y 2. SDA 3. NS 4. [ED][R] 5. C 6. GSTF 7. ILF 8. C 9. [L][A][N] 10. [YH][A] 11. NY 12. RAILV 13. EQ 14. HQLS 15. GHRN 16. ATR 17. [NHST][E]
35. VIL 36. ESKAGN 37. [K][L] 38. ELKSRV 39. [YHS][K] 40. [DN][M] 41. GA 42. EKRA 43. C 44. RKE 45. PLQE 46. KERD 47. [ISV][H] 48. [VG][PT] 49. [MEK][PS]
Application of genetic semihomology algorithm for identifying the fragments of possible different mechanism than single
point mutation
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Dot matrix pairwise alignment
Noise reduction
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Dot matrix pairwise alignment
Internal homology (gene multiplication)
Chicken ovoinhibitor precursor
(7 domains)
Chicken ovomucoid precursor
(3 domains)
BLAST 2 SEQUENCES SEMIHOM
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Dot matrix comparison of selected homologous Kazal inhibitors
ovoinhibitor-ovoinhibitor ovoinhibitor-ovomucoid ovoinhibitor-PSTI
BLASTP 2.0.9 (Blosum62)
SEMIHOM (algorithm of genetic semihomology)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Consecutive steps of dot matrix results obtained from comparison of chicken ovoinhibitor with itself by program SEMIHOM
Raw dot matrix
Noise filtering
adding semihomology
marking of whole fragments
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Precise gap location in non-identical fragments of different length by using the genetic semihomology algorithm. The compared proteins are trypsin inhibitors
from squash seeds - CPGTI-I (horizontal) and CSTI-IIb (vertical).
CPGTI-I has a non homologous His25 while CSTI-IIb possesses at relative site two non homologous residues Asp25 and Ile26.
identities only
?RVCPKILMECKKDSDCLAECICLEH-GYCGMVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS
?RVCPKILMECKKDSDCLAECICLE-HGYCGMVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS
identities and semihomology
The analysis of semihomology shows the genetic relationship between His25 and Asp25, therefore a gap in CPGTI-I should be located next to Ile26 in CSTI-IIb. Window size = 10; minimum number of homologous positions in window = 4.
RVCPKILMECKKDSDCLAECICLEH-GYCGMVCPKILMKCKHDSDCLLDCVCLEDIGYCGVS
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Proteinase inhibitors from squash seeds
Acc.number
sp|P01074 R V C P R I L M E C K K D S D C L A E C V C L E H - G Y C G sp|P07853 R V C P R I L M K C K K D S D C L A E C V C L E H - G Y C G sp|P07853 H E E R V C P R I L M K C K K D S D C L A E C V C L E H - G Y C G sp|P10293 R V C P K I L M E C K K D S D C L A E C I C L E H - G Y C G sp|P10293 R V C P K I L M E C K K D S D C L A E C I C L E H - G Y C G sp|P10293 H E E R V C P K I L M E C K K D S D C L A E C I C L E H - G Y C G sp|P10291 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S sp|P10292 M M C P R I L M K C K H D S D C L P G C V C L E H I E Y C G sp|P11969 G R R C P R I Y M E C K R D A D C L A D C V C L Q H - G I C G sp|P11968 R G C P R I L M R C K R D S D C L A G C V C Q K N - G Y C G sp|P17680 G I C P R I L M E C K R D S D C L A Q C V C K R Q - G Y C G sp|P12071 G C P R I L M R C K Q D S D C L A G C V C G P N - G F C G sp|P10294 < E R R C P R I L K Q C K R D S D C P G E C I C M A H - G F C G
Multiple alignment
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Multiple alignment of seven chicken
ovoinhibitor domains obtained with
Markovian and non-Markovian methods
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Consensus alignment results for Bowman Birk inhibitors 1 10 20 30 40 50 60 ClustalW .. ...**: * ** * ** * * *: ::*** *. * *: * * :* * * .*** *:Multalin .d....aCC#.C.CTkS.PPqCrC.Dir.#tCHSaCksCiCtrS.PpqCrC.Dtt.FCYk.C.GS AA S.s.SSS**S.*S**s*S**.*S*S*SS*.S***S*.S*s*Ss*s*SS*s*s*SSS***ss*SGS NA sss.SSSS*S.*.**.*SS*sSs*s*SS*sSSS*SSssSsSSsSsSSSS.SsSSsS*SSs.Ss
Consensus alignment results for squash inhibitors 1 10 20 30ClustalW ** * * : ** * * **Multalin ....r.CPrIlm.Ck.DsDCla.C.Cl...G.CG..GS AA SS**S*sSS*SsSs**SSS*S*SSS.SS** Consensus alignment results for ovoinhibitor domains (Kazal type) 1 10 20 30 40 50 60ClustalW .* : *. *.::. ** . * :* : : . *Multalin .dcs.y......dg...vaCp.il.pvCgt#gvTYsneC..Cahn.#..t...k..dg.C......GS AA SS*sSSSSSSSSS*SSSS.*Sss..ss*sSSSS**sSs*sS*.sSSSsSSsSS.sssSS*SSSsssGS NA sSSs.S.S....sSSssss*Sss..ssssSSSSSSsSsS.sS.sS.S...sSSs..sSSSssss.s
Comparison of multiple alignment consensus results for three inhibitor families achieved with ClustalW, Multalin (both
applied BLOSUM62) and algorithm of genetic semihomology
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
-spectrin
-s
pec
trin
500 1000 1500 2000
500
1000
1500
2000
The dot matrix comparison of human erythrocyte alpha-spectrin with itself
For the best visualization of the repeats, the identity threshold is set as 15, and frame size as 75
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The multiple alignment of proteins homologous to the 10th segment of human erythrocyte -spectrin
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
The multiple alignment of human erythrocyte α-spectrin 23 segments achieved with the MultAlin (BLOSUM62) method
and genetic semihomology approach
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
-s
pec
trin
rep
eats
con
sen
sus
(Sah
r et
al) 500 1000 1500 2000
20
-spectrin
60
100
-spectrin
500 1000 1500 2000
-s
pec
trin
rep
eats
con
sen
sus
(g
enet
ic s
emih
omol
ogy)
20
60
100
The dot matrix comparison of human erythrocyte α-spectrin with the consensus of 106-residue repeats
The consensus described by Sahr et al. [1990]. The identity threshold and frame
size are set as 8 and 40 respectively.
The consensus achieved with genetic semihomology approach [Leluk, 1998]. The identity threshold and frame size are set as
8 and 40 respectively. The identity and genetic semihomology of the compared
residues is visualized
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Query sequence: α-Spectrin segment consensus sequence obtained with MultAlin program (BLOSUM62)
Score E-valueSequences producing significant alignments: (bits) sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE 17 21352sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,... 16 27970sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE 16 27970sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 16 27970sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... 16 27970sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... 16 27970sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N... 16 36639sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN 16 47996sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN 15 62874sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 14 141334sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE 14 185142sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE 14 185142sp|P39254|Y04O_BPT4 HYPOTHETICAL 36.3 KD PROTEIN IN NRDC-M... 11 935541
The use of α-spectrin consensus sequence achieved with different algorithms for sequence similarities search (BLAST)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Query sequence: α-Spectrin segment consensus sequence attained by Sahr et al. [1990]
Score E-valueSequences producing significant alignments: (bits) sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN 43 2e-04sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... 40 0.001sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 39 0.003sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 39 0.004sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N... 37 0.012sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE 37 0.016sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE 36 0.021sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE 35 0.048sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN 35 0.063sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,... 34 0.082sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... 34 0.11sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE 32 0.32sp|P05095|AACT_DICDI ALPHA-ACTININ 3, NON MUSCULAR (F-ACTIN... 21 797sp|P34367|YLJ2_CAEEL HYPOTHETICAL 256.3 KD PROTEIN C50C3.2 ... 20 1791sp|Q03001|BPA1_HUMAN BULLOUS PEMPHIGOID ANTIGEN 1 (BPA) (H... 19 3073sp|P30427|PLEC_RAT PLECTIN 19 4026sp|P31670|GT27_FASHE GLUTATHIONE S-TRANSFERASE 26 KD 47 (GS... 19 4026sp|P46125|YEDI_ECOLI HYPOTHETICAL 32.2 KD PROTEIN IN DSRB-V... 18 6908sp|P42094|PHYT_BACSU 3-PHYTASE PRECURSOR (PHYTATE 3-PHOSPHA... 18 6908sp|P56288|E2BG_SCHPO PROBABLE TRANSLATION INITIATION FACTOR... 18 9049sp|O00273|DFFA_HUMAN DNA FRAGMENTATION FACTOR ALPHA SUBUNI... 18 9049sp|P12311|ADH1_BACST ALCOHOL DEHYDROGENASE (ADH-T) 18 9049
The use of α-spectrin consensus sequence achieved with different algorithms for sequence similarities search (BLAST)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Query sequence: α-Spectrin segment consensus sequence attained by genetic semihomology algorithm
Score E-valueSequences producing significant alignments: (bits) sp|P13395|SPCA_DROME SPECTRIN ALPHA CHAIN 49 3e-06sp|Q13813|SPCN_HUMAN SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN... 47 1e-05sp|P07751|SPCN_CHICK SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 42 3e-04sp|P16086|SPCN_RAT SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN, N... 41 6e-04sp|P02549|SPCA_HUMAN SPECTRIN ALPHA CHAIN, ERYTHROCYTE 38 0.005sp|Q00963|SPCB_DROME SPECTRIN BETA CHAIN 37 0.014sp|P08032|SPCA_MOUSE SPECTRIN ALPHA CHAIN, ERYTHROCYTE 36 0.019sp|P15508|SPCB_MOUSE SPECTRIN BETA CHAIN, ERYTHROCYTE 36 0.024sp|P16546|SPCN_MOUSE SPECTRIN ALPHA CHAIN, BRAIN (SPECTRIN,... 36 0.032sp|Q01082|SPCO_HUMAN SPECTRIN BETA CHAIN, BRAIN (SPECTRIN,... 34 0.094sp|Q62261|SPCO_MOUSE SPECTRIN BETA CHAIN, BRAIN (SPECTRIN, ... 34 0.094sp|P11277|SPCB_HUMAN SPECTRIN BETA CHAIN, ERYTHROCYTE 33 0.21sp|P34367|YLJ2_CAEEL HYPOTHETICAL 256.3 KD PROTEIN C50C3.2 ... 20 1195sp|Q99001|AACB_CHICK ALPHA-ACTININ, BRAIN ISOFORM (F-ACTIN ... 20 1566sp|P35609|AAC2_HUMAN ALPHA-ACTININ 2, SKELETAL MUSCLE ISOF... 20 1566sp|P12814|AAC1_HUMAN ALPHA-ACTININ 1, CYTOSKELETAL ISOFORM ... 20 1566sp|P05094|AACT_CHICK ALPHA-ACTININ, SMOOTH MUSCLE ISOFORM (... 20 1566sp|P30427|PLEC_RAT PLECTIN 19 2687sp|P20111|AACS_CHICK ALPHA-ACTININ, SKELETAL MUSCLE ISOFORM... 19 3520sp|P47493|SYG_MYCGE GLYCYL-TRNA SYNTHETASE (GLYCINE--TRNA L... 18 4611sp|Q03001|BPA1_HUMAN BULLOUS PEMPHIGOID ANTIGEN 1 (BPA) (H... 18 7913
The use of α-spectrin consensus sequence achieved with different algorithms for sequence similarities search (BLAST)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Kinase projectMultiple alignment of hexokinase domains
HEXOKINASE_I (2 domains) HEXOKINASE_II (2 domains)HEXOKINASE_III (2 domains)HEXOKINASE_IV (1 domain)
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Kinase project
Complex comparative studies at the primary structure level
Construction of molecular phylogenetic trees
Studies on sequence/structure/function relationship
Studies on the mechanisms of correlated mutations and variability
Genetic principles of differentiation within this protein family
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
T
IM
V
RS A
KN G P FL
L
DE R S
HQ CW
Y
Simplified (planar) diagram of genetic relationships between amino acids
In planar diagram the encoding role of the third codon position is ignored.
Only first two codon positions are taken into account.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
T
IM
V
RS A
KN G P FL
L
DE R S
HQ CW
Y
Simplified (planar) diagram of genetic relationships between amino acids
The simplified planar diagram emphasizes the special encoding character of six-codon amino acids – Leu, Arg and Ser.
The six-codon amino acids may play the role the of „mutational passages” that are not liable to the selection restrictions.
These amino acids may influence on the variability range increase.
In fact the six-codon amino acids occur unusually frequent at very variable positions. This concerns especially serine, and to lesser extent – arginine.
Leucine does not show the correlation between the frequency of occurrence and variability range.
Conclusions?
T
IM
V
RS A
KN G P FL
L
DE R S
HQ CW
Y
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Frequency of six-codon amino acids as a function of position variability in randomly selected proteins of different origin and
nature
The results for 2686 residues at 606 corresponding positions
ALL SEQUENCES(2686 residues at 606 positions)
(discrete data)
0
20
40
60
80
100
1 2 3 4 5 6 7 8
Number of residues occurring at aligned position
% o
f occ
urre
nce
Ser
Arg
Leu
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
I II III IV V VI VII I 61 64 54 54 53 45 II 53 66 65 57 54 45 III 56 64 66 59 66 42 IV 47 56 61 55 57 39 V 44 44 56 50 58 42 VI 38 41 52 42 42 40 VII 25 28 26 25 26 29
I II III IV V VI VII I 52 57 39 45 42 33 II 72 64 59 66 65 55 III 75 81 56 51 62 33 IV 68 71 77 49 45 30 V 67 66 76 68 46 31 VI 60 65 75 68 70 38 VII 52 55 53 52 52 56
The ovoinhibitor domains homology comparison. The similarity scores (%) for the aligned DNA-coding sequences are shown above the diagonal. Below diagonal are the similarity scores for the aligned amino acid sequences.
The results obtained by Scott et al.
The results obtained by semihomology analysis
Phylogenetic tree verification
Scott M. J., Huckaby C. S., Kato I., Kohr W. J., Laskowski M. Jr, Tsai M.-J. and O'Malley B. W. (1987) J. Biol. Chem. 262, 5899-5907
I VII II III IV V VI I II III PSTI OI OI OM OM
VII III IV I V VI I II III PSTI OI OI OM OM
II
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Arg codons: AGA AGG CGA CGG CGC CGT
Met codons: ATG
Virtual reverse translation
Deducing the genetic code
QRCRRDSDCKKCRMDSDC
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Arg codons: AGA AGG CGA CGG CGC CGT
Met codons: ATG
Virtual reverse translation
Deducing the genetic code
QRCRRDSDCKKCRMDSDC
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
Q
Q
H
H
–
–
Y
Y
E
E
D
D
K
K
N
N
R
R
R
R
–
W
C
C
G
G
G
G
R
R
S
S
P
P
P
P
S
S
S
S
A
A
A
A
T
T
T
T
L
L
L
L
L
L
F
F
V
V
V
V
I
M
I
I
AGCU 1
3 2
Virtual reverse translation
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
R
R
R
R
R
R
M
Q
Q
H
H
–
–
Y
Y
E
E
D
D
K
K
N
N
–
W
C
C
G
G
G
G
S
S
P
P
P
P
S
S
S
S
A
A
A
A
T
T
T
T
L
L
L
L
L
L
F
F
V
V
V
V
I
I
I
AGCT 1
3 2
Virtual reverse translation
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
CGA
CGG
CGC
CGT
AGA
AGG
ATG
Q
Q
H
H
–
–
Y
Y
E
E
D
D
K
K
N
N
–
W
C
C
G
G
G
G
S
S
P
P
P
P
S
S
S
S
A
A
A
A
T
T
T
T
L
L
L
L
L
L
F
F
V
V
V
V
I
I
I
AGCT 1
3 2
Virtual reverse translation
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
CGA
CGG
CGC
CGT
AGA
Q
Q
H
H
–
–
Y
Y
E
E
D
D
K
K
N
N
–
W
C
C
G
G
G
G
S
S
P
P
P
P
S
S
S
S
A
A
A
A
T
T
T
T
L
L
L
L
L
L
F
F
V
V
V
V
I
I
I
AGCT 1
3 2
ATG
AGG
Virtual reverse translation
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
H
H
–
–
Y
Y
E
E
D
D
K
K
N
N
R
R
–
W
C
C
G
G
G
G
R
S
S
P
P
P
P
S
S
S
S
A
A
A
A
T
T
T
T
L
L
L
L
L
L
F
F
V
V
V
V
I
I
I
AGCU 1
3 2
Genetic relationships between Arg and Met/Gln
M
R
R
Q
R
Q
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
1 4 5 7 8 12 13 15 22 33 37 49 50 59 61 62 63 66 I V S L A S K D T R S N G A E R P K M II I S P L Q R D N R F N H T E K L E S III V S K P S K D R R F N R T K R Q E E IV I D Q P T T G K R F N G T R K E R P V L T Q L S Q N E F V N G T R R E E E VI L S K K T K D R M V S R T R E E D K VII E R E Q K - - - M V N N R A ATA AGY CCR GCR ACR ACR GGY AGR AGG GTY AGY GGY AGR AGR AGR CCR AGR AAG GTA CGY CTR CCR TCR TCY CGY ACR GCR CTR GAG GAA CTR GCR YTA Prediction accuracy = 69.7%
Genetic semihomology prediction of genetic code for chicken ovoinhibitor domains.
The predicted gene sequence is compared with the known DNA sequence. Only those positions where prediction reduces possible codons are considered. The predicted codons that are consistent with those present in the ovoinhibitor gene are shadowed.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (1)
Leluk, J., Pham, T.-C. (1985) Calculation and analysis of protein sequence homology using microcomputer ZX Spectrum. VIIth Polish Conferrence "Chemistry of Amino Acids and Peptides", Abstracts, 83
Kubiak, Z.J., Leluk, J. (1986) Application of the ZX Spectrum microcomputer for prediction of the secondary structure of proteins. Pr. Nauk. Inst. Chem. Nieorg. Metal. Pierwiastków Rzadkich. Wrocław 55, 186-189
Leluk, J. (1993) Analysis of protein primary structure using program HOMOLOGY. XXIXth Meeting of Polish Biochemical Society, Abstracts, 384
Leluk, J., Krowarsch, D. (1994) A new algorithm for analysis the homology in protein primary structure, 3rd Conferrence "Computers in Chemistry", Wrocław, Poland, Abstracts, 60.
Leluk, J. (1994) Application of program SEMIHOM based on a new algorithm for homology analysis of protein primary structure. Ist Conference "Computer-based Scientific Research", Wrocław, Poland, Abstracts, 189-194. .
Leluk, J. (1996) Comparison of the algorithm of genetic semihomology with currently applied algorithms for protein homology analysis. IIIrd Conferrence "Computer-based Scientific Research", Wrocław, Polanica-Zdrój, Abstracts, 53-58.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (2)
Leluk, J., Kuźnicki, T. (1996) Analysis of homology of selected proteinase inhibitor families using programs HOMOLOGY and SEMIHOM. IVth Conferrence "Computers in Chemistry '96", Polanica-Zdrój, Abstracts, 37.
Leluk, J. (1998) A new algorithm for analysis of the homology in protein primary structure. Computers & Chemistry, 22, 123-131.
Leluk, J. (1999a) Mutational variability in proteins and the Markovian model of replacement. 5th International Conference "Computers in Chemistry '99", Szklarska Poręba, Poland, Abstracts, P47.
Leluk, J. (1999b) Studies on mutational regularities in selected proteinase inhibitory families. 1st Polish Congress of Biotechnology, Wrocław, Poland, Abstracts (lectures), 183-184.
Leluk, J. (2000a) Serine proteinase inhibitor family in squash seeds: Mutational variability mechanism and correlation. Cell. Mol. Biol. Lett., 5, 91-106.
Leluk, J. (2000b) A non-statistical approach to protein mutational variability. BioSystems, 56, 83-93.
Leluk, J. (2000c) Regularities in mutational variability in selected protein families and the Markovian model of amino acid replacement. Computers & Chemistry, 24, 659-672.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (3)
Leluk, J., Grabiec, M., Sobczyk, M. (2000) The application of genetic semihomology algorithm for theoretical studies on various protein families. International Conference on Conformation of Peptides, Proteins and Nulceic Acids, August 29 - September 2, 2000, Debrzyno, Poland - Abstracts, p. 27.
Leluk, J., Hanus-Lorenz, B. Sikorski, A.F. (2001a) Application of genetic semihomology algorithm to theoretical studies on various protein families. Acta Biochim. Polon., 48, 21-33.
Leluk, J. (2001a) An introduction to theoretical studies on protein primary structure. International Workshop “Information Theory Days” Warsaw, April 23-29 2001.
Leluk, J. (2001b) Benefits from theoretical comparative studies on diverse protein families. International Workshop “Information Theory Days” Warsaw, April 23-29 2001.
Leluk, J. and Grabiec, M. (2001) Sequence similarity estimation and correlated mutations in selected protein families. I. An approach to protein sequence similarity estimation. Ist Summer School on “Parallel Computing in Biomolecular Simulations”, September 1-3 2001, Gdańsk, Poland; Abstracts L-5.
Hanus-Lorenz, B., Hryniewicz-Jankowska, A., Leluk, J., Lorenz, M., Skała, J. and Sikorski, A.F. (2001) Spectrin motifs are detected in plant and yeast genomes, 8th International W. Mejbaum-Katzenellenbogen's Seminar on Membrane Skeleton and Its Regulatory Functions, Szklarska Poręba 2001, Cell. Mol. Biol. Lett., 6, 207.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (4)
Leluk, J., Sobczyk, M. and Becella, Ł. (2001b) Sequence similarity estimation and correlated mutations in selected protein families. II. Correlated mutations in selected protein families. Ist Summer School on “Parallel Computing in Biomolecular Simulations”, September 1-3 2001, Gdańsk, Poland; Abstracts L-5.
Leluk, J., Sobczyk, M. and Becella, Ł. (2002) Correlated mutations in selected protein families, TASK Quarterly, , 6(3), 469-482.
Leluk, J., Konieczny, L. and Roterman, I. (2003) Search for structural similarity in proteins, Bioinformatics, 19(1), 117-124.
Barbara Bereza, Agnieszka Kubiak, Jacek Leluk, Wacław Hendrich (2003) Photoenzyme Protochlorophyllide-NADPH oxidoreductase (LPOR) – the key for chlorophyll biosynthesis, Postępy Biochemii, 49, 46-55.
Jacek Kuśka, Jacek Leluk (2003) Theoretical studies on sugar and protein kinases: Structural and mutational variability, 3rd International Conference “Inhibitors of Protein Kinases”, Cell. Mol. Biol. Lett., 8, 592
Anna Zdyb, Jacek Leluk (2003) Comparative studies on sugar and protein kinases: Genetic relationships and consensus sequences, 3rd International Conference “Inhibitors of Protein Kinases”, Cell. Mol. Biol. Lett., 8, 593
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (5)
Jacek Leluk, Bogdan Lesyng (2003) A comparative study of the primary structures of protein and sugar kinases, 3rd International Conference “Inhibitors of Protein Kinases”, Cell. Mol. Biol. Lett., 8, 620-621
A. Czogalla, P. Kwołek, A. Hryniewicz-Jankowska, M. Nietubyć, J. Leluk, A.F. Sikorski (2003) A protein isolated from Escherichia coli, identified as GroEL, reacts with anti-beta spectrin antibodies, Arch Biochem Biophys., 415, 94-100.
M. Kukuła, B. Hanus-Lorenz, E. Bok, J. Leluk, and A. F. Sikorski (2004) Proteins with Spectrin Motifs Which Do Not Belong to the Spectrin-a-Actinin-Dystrophin Family, Z. Naturforsch., 59c, 565-571
Jacek Leluk (2002) Studies on length-dependent estimation of sequence identity significance, XXXVIIIth Meeting of Polish Biochemical Society, Abstracts, 198-199.
Anna Zdyb, Jacek Leluk (2002) Theoretical studies on homology among kinase family, XXXVIIIth Meeting of Polish Biochemical Society, Abstracts, 215.
Jacek Kuśka, Jacek Leluk (2002) Contact-variability-structure relationship within the kinase molecule, XXXVIIIth Meeting of Polish Biochemical Society, Abstracts, 215.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (6)
Jacek Kuśka, Jacek Leluk (2003) Studies on kinase structure and mutational variability, 2nd Polish Congress of Biotechnology, Łódź, Poland 23-27.06.2003, Abstracts, 75.
Anna Zdyb, Jacek Leluk (2003) Kinases – Comparative analysis, genetic relationships and consensus sequence, 2nd Polish Congress of Biotechnology, Łódź, Poland 23-27.06.2003, Abstracts, 75.
Jacek Leluk, Bogdan Lesyng (2003) Comparative analysis of homologous kinases, 2nd Polish Congress of Biotechnology, Łódź, Poland 23-27.06.2003, Abstracts, 235.
Jacek Leluk (2003) Wrong assumptions and misinterpretations in molecular biology, biochemistry and bioinformatics, Gliwice Scientific Meetings 2003, Gliwice, Poland, Abstracts, 16
Anna Fogtman, Jacek Leluk, Bogdan Lesyng (2004) Construction of consensus sequences of the β-spectrin family with variable parameter-thresholds and validation of the applied procedure, 29th FEBS Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 29-30.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (7)
Adam Górecki, Jacek Leluk, Bogdan Lesyng (2004) A Java-implementation of a genetic semihomology algorithm (GEISHA), and its applications for analyses of selected protein families, 29th FEBS Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 30.
Jacek Leluk, Artur Mikołajczyk (2004) A new approach to sequence comparison and similarity estimation, 29th FEBS Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 29.
Agata Meglicz, Jacek Leluk, Bogdan Lesyng (2004) Protein inhibitors of kinases – homology analysis, mechanisms of differentiation and correlated mutations, 29th FEBS Congress, 26 June - 1 July 2004, Warsaw, POLAND, Eur. J. Biochem., 271, Supplement, 30.
Anna Fogtman, Jacek Leluk, Bogdan Lesyng (2005) β-Spectrin consensus sequence construction with variable threshold parameters; verification of their usefulness; The International Conference Sequence-Structure-Function Relationships; Theoretical and Experimental Approaches, Warsaw, Poland, April 6-10 2005, Abstracts, 12.
Jacek Leluk - ICM, Warsaw University and Department of Molecular Biology, University of Zielona Góra
References (8)
Jacek Kuśka, Jacek Leluk, Bogdan Lesyng (2005) The variability patterns in kinase subfamilies; The International Conference Sequence-Structure-Function Relationships; Theoretical and Experimental Approaches, Warsaw, Poland, April 6-10 2005, Abstracts, 23.
Elżbieta Gajewska, Jacek Leluk (2005) An approach to sequence similarity significance estimation; Theoretical and Experimental Approaches, Warsaw, Poland, April 6-10 2005, Abstracts, 14.