wrong assumptions and misinterpretations in explanations of biological models, phenomena and...
Post on 21-Dec-2015
215 views
TRANSCRIPT
Wrong assumptions and misinterpretations
in explanations of biological models, phenomena and processes
Jacek LelukICM UW
orIs biologist logical,
and computer scientist alive?
How is it, that your genome is in 98% the same as genome of
chimpanzee and only in 50% as your own father’s genome?
"O składności członów człowieczych"
Dlaczego ptacy mleka nie dają?
Bo musiałyby mieć cyce, które by im wadziły ku lataniu.
Andrzej z Kobylina (XVI w.)
Is biology „bilogical”?Nomenclature chaos:• Mitochondria or chondriosomes?• Is papain a proteolytic enzyme?• definition of identity, similarity an homologyMisinterpretaion:• Amino acid sequence of gene?• Why squash inhibitors are inhibitors?• Is wheat aglutinin to aglutinate rabbit red cells?Incomplete knowledge• Stochastic index matrices• Statistical description of biological processes
The problem of terminology
• BPTI - Basic Pancreatic Trypsin Inhibitor - Bovine Pancreatic Trypsin Inhibitor - Basic Protein Trypsin Inhibitor
• PAM- Point Accepted Mutations- Percent Accepted Mutations
• Kunitz trypsin inhibitor- BPTI - mammalian organs- STI - soybean trypsin inhibitor
What may everybody do wrong?
Monte Carlo approach in structure analysis and prediction - – what state do we predict?
Mathematical modelling of life processes – - Markov chains and protein evolution and differentiation- significance similarity estimation
What may biologists do wrong?
Amino acids and proteins – - do proteins consist of amino acids as we describe?
Definitions and theory –- definition of species and theory of evolution- definitions and biology
Correlated mutations –- dispersed correlation
What may theoreticians do wrong?
Primitive or ancestral? –- (Cyanophyta, Archaebacteria, ape and human)
Global and local energy minima –- can we predict the exact conformation at exact time?
Microscopic/mesoscopic/macroscopic processes - - water molecule and tsunami
Assumptions and conclusions –- incomplete assumptions and wrong conclusions- deformations by simplifying- is the protein sequence just a string of characters?
Sequence identity estimation in proteomics and genomics
Identity threshold – does it make sense?
WHAT IS IMPORTANT IN THE PROTEIN SIMILARITY SEARCH ?
1) Contribution (%) of identical positions
2) Length of the compared strings (sequences)
3) Distribution of the identical positions along the analyzed sequence
WHAT IS IMPORTANT IN THE PROTEIN SIMILARITY SEARCH ? 1) Contribution (%) of identical positions PKILMECKKD 8 PKILMECKKD 2 PKILMKCKHD 80% SDCLLDCVCL 20% similar not similar 2) Length of the compared strings (sequences) LCE 1 MVEICIEPKIRCIKVCTKDERITCLILDET 8 WCG 33.3% MVYWCPRRFMHCVHLKAGGCTCWCLRLDYY 26% casual probably similar 3) Distribution of the identical positions along the analyzed sequence MVEMICIEPKIRCIKVCTKDERITL 5 MVEMIMAGDARCIKVCTKDERITCL 5 HVYYWRPERFMHTVKLKAGGCRCWL 20% HHYYWMAGDAHTVQLKAGGCWCWAG 20% casual similar 4) Residues at conservative positions MVCPKILMKCKHDSDCLLDCVCLED MVCPKILMKCKHDSDTLLDCVCLED EDEGKRRTKREHFKESNLAAAFKEQ QNCPGPREWCFTTRMNDSSCACPQT not similar similar 5) Structural/genetic similarity of the amino acids at non-conservative positions
Identity only
MVCPKILMKCKHDSDCLLDCVCLED RLCRRLVKRCRKETECIVECICIDE
Structural Genetic MVCPKILMKCKHDSDCLLDCVCLED MVCPKILMKCKHDSDCLLDCVCLED RLCRRLVKRCRKETECIVECICIDE RLCRRLVKRCRKETECIVECICIDE
WHAT IS IMPORTANT IN THE PROTEIN SIMILARITY SEARCH ? 1) Contribution (%) of identical positions PKILMECKKD 8 PKILMECKKD 2 PKILMKCKHD 80% SDCLLDCVCL 20% similar not similar 2) Length of the compared strings (sequences) LCE 1 MVEICIEPKIRCIKVCTKDERITCLILDET 8 WCG 33.3% MVYWCPRRFMHCVHLKAGGCTCWCLRLDYY 26% casual probably similar 3) Distribution of the identical positions along the analyzed sequence MVEMICIEPKIRCIKVCTKDERITL 5 MVEMIMAGDARCIKVCTKDERITCL 5 HVYYWRPERFMHTVKLKAGGCRCWL 20% HHYYWMAGDAHTVQLKAGGCWCWAG 20% casual similar 4) Residues at conservative positions MVCPKILMKCKHDSDCLLDCVCLED MVCPKILMKCKHDSDTLLDCVCLED EDEGKRRTKREHFKESNLAAAFKEQ QNCPGPREWCFTTRMNDSSCACPQT not similar similar 5) Structural/genetic similarity of the amino acids at non-conservative positions
Identity only
MVCPKILMKCKHDSDCLLDCVCLED RLCRRLVKRCRKETECIVECICIDE
Structural Genetic MVCPKILMKCKHDSDCLLDCVCLED MVCPKILMKCKHDSDCLLDCVCLED RLCRRLVKRCRKETECIVECICIDE RLCRRLVKRCRKETECIVECICIDE
WHAT IS IMPORTANT IN THE PROTEIN SIMILARITY SEARCH ? 1) Contribution (%) of identical positions PKILMECKKD 8 PKILMECKKD 2 PKILMKCKHD 80% SDCLLDCVCL 20% similar not similar 2) Length of the compared strings (sequences) LCE 1 MVEICIEPKIRCIKVCTKDERITCLILDET 8 WCG 33.3% MVYWCPRRFMHCVHLKAGGCTCWCLRLDYY 26% casual probably similar 3) Distribution of the identical positions along the analyzed sequence MVEMICIEPKIRCIKVCTKDERITL 5 MVEMIMAGDARCIKVCTKDERITCL 5 HVYYWRPERFMHTVKLKAGGCRCWL 20% HHYYWMAGDAHTVQLKAGGCWCWAG 20% casual similar 4) Residues at conservative positions MVCPKILMKCKHDSDCLLDCVCLED MVCPKILMKCKHDSDTLLDCVCLED EDEGKRRTKREHFKESNLAAAFKEQ QNCPGPREWCFTTRMNDSSCACPQT not similar similar 5) Structural/genetic similarity of the amino acids at non-conservative positions
Identity only
MVCPKILMKCKHDSDCLLDCVCLED RLCRRLVKRCRKETECIVECICIDE
Structural Genetic MVCPKILMKCKHDSDCLLDCVCLED MVCPKILMKCKHDSDCLLDCVCLED RLCRRLVKRCRKETECIVECICIDE RLCRRLVKRCRKETECIVECICIDE
WHAT IS IMPORTANT IN THE PROTEIN SIMILARITY SEARCH ?
4) Residues at the conservative positions
5) Structural/genetic similarity of the amino acids at non-conservative positions
WHAT IS IMPORTANT IN THE PROTEIN SIMILARITY SEARCH ? 1) Contribution (%) of identical positions PKILMECKKD 8 PKILMECKKD 2 PKILMKCKHD 80% SDCLLDCVCL 20% similar not similar 2) Length of the compared strings (sequences) LCE 1 MVEICIEPKIRCIKVCTKDERITCLILDET 8 WCG 33.3% MVYWCPRRFMHCVHLKAGGCTCWCLRLDYY 26% casual probably similar 3) Distribution of the identical positions along the analyzed sequence MVEMICIEPKIRCIKVCTKDERITL 5 MVEMIMAGDARCIKVCTKDERITCL 5 HVYYWRPERFMHTVKLKAGGCRCWL 20% HHYYWMAGDAHTVQLKAGGCWCWAG 20% casual similar 4) Residues at conservative positions MVCPKILMKCKHDSDCLLDCVCLED MVCPKILMKCKHDSDTLLDCVCLED EDEGKRRTKREHFKESNLAAAFKEQ QNCPGPREWCFTTRMNDSSCACPQT not similar similar 5) Structural/genetic similarity of the amino acids at non-conservative positions
Identity only
MVCPKILMKCKHDSDCLLDCVCLED RLCRRLVKRCRKETECIVECICIDE
Structural Genetic MVCPKILMKCKHDSDCLLDCVCLED MVCPKILMKCKHDSDCLLDCVCLED RLCRRLVKRCRKETECIVECICIDE RLCRRLVKRCRKETECIVECICIDE
WHAT IS IMPORTANT IN THE PROTEIN SIMILARITY SEARCH ? 1) Contribution (%) of identical positions PKILMECKKD 8 PKILMECKKD 2 PKILMKCKHD 80% SDCLLDCVCL 20% similar not similar 2) Length of the compared strings (sequences) LCE 1 MVEICIEPKIRCIKVCTKDERITCLILDET 8 WCG 33.3% MVYWCPRRFMHCVHLKAGGCTCWCLRLDYY 26% casual probably similar 3) Distribution of the identical positions along the analyzed sequence MVEMICIEPKIRCIKVCTKDERITL 5 MVEMIMAGDARCIKVCTKDERITCL 5 HVYYWRPERFMHTVKLKAGGCRCWL 20% HHYYWMAGDAHTVQLKAGGCWCWAG 20% casual similar 4) Residues at conservative positions MVCPKILMKCKHDSDCLLDCVCLED MVCPKILMKCKHDSDTLLDCVCLED EDEGKRRTKREHFKESNLAAAFKEQ QNCPGPREWCFTTRMNDSSCACPQT not similar similar 5) Structural/genetic similarity of the amino acids at non-conservative positions
Identity only
MVCPKILMKCKHDSDCLLDCVCLED RLCRRLVKRCRKETECIVECICIDE
Structural Genetic MVCPKILMKCKHDSDCLLDCVCLED MVCPKILMKCKHDSDCLLDCVCLED RLCRRLVKRCRKETECIVECICIDE RLCRRLVKRCRKETECIVECICIDE
Sequence multiple alignment
Problem of gap manipulation
Any protein can be aligned with each other as homologous/similar
anybiologicalstring anybilogicalstrip
anybiologicalstri-ng anybi-logicalstrip
anyproteincanbealigned
-an-yprote--i-ncanb-----ealigned
Statistical approaches vs. accuracyHow far may they be improved?
Protein secondary structure prediction – accuracy 70-72%(not much changed since 1978)
100% accuracy requires the complete database for all possible structures.
For 30 AA polypeptides – 2030 sequences/secondary structures
Searching the database for appropriate sequence/structure with the rate 1012 sequences/sec. would proceed 1.8 bilion times longer than the age
of the Universe.
Genetic conditioning of the amino acid replacement probabilities and
spectrum in molecular evolution
The Markov model assumes that the substitution probability of amino acid AA1 by AA2 is the same, regardless of what the initial
residue AA1 was transformed from (AAx, AAy)
The currently used statistical algorithms are based on Markovian model of the amino acid replacement (they directly use stochastic
matrices of replacement frequency indices)
AA1 AA2AAx
Pa
AA1 AA2AAy
Pb
Pa = Pb
BLOSUM62 matrix of amino acid replacements
A 4 R -1 5 N -2 0 6 D -2 -2 1 6 C 0 -3 -3 -3 9 Q -1 1 0 0 -3 5 E -1 0 0 2 -4 2 5 G 0 -2 0 -1 -3 -2 -2 6 H -2 0 1 -1 -3 0 0 -2 8 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A R N D C Q E G H I L K M F P S T W Y V
Why tryptophane is here the most conservative residue?
Replacemant Arg Lys according to the statistical interpretation using stochastical matrix indices
Arg Lys
PAM250 3
BLOSUM62 2
BLOSUM35 2
BLOSUM45 3
BLOSUM100 3
Arginine-to-lysine mutational replacements
GlnCAR
ArgAGR
ArgAGR
SerAGY
ArgCGR
HisCAY
LysAARCGY
Arg
LeuCTR
LysAARCGR
Arg
MetATG
LysAAGAGG
Arg
Thr
Ser
SerUCG
SerAGU
Ile Asn
Arg Cys
Gly
TrpUGG
AlaThr Pro
TrpSer Leu
(UAG)
AsnAAU
Possible one-point-mutational processing of serine with respect to its origin
Possible codons for arginine: AGA AGG CGA CGG CGC CGT
Is arginine the same as arginine?
AGCU 1
3 2
Q
Q
H
H
–
–
Y
Y
E
E
D
D
K
K
N
N
R
R
R
R
–
W
C
C
G
G
G
G
R
R
S
S
P
P
P
P
S
S
S
S
A
A
A
A
T
T
T
T
L
L
L
L
L
L
F
F
V
V
V
V
I
M
I
I
Diagram of amino acid genetic relationships CAA UAA GAA AAA
CAG UAG GAG AAG
CAC UAC GAC AAC
CAU UAU GAU AAU
CGA UGA GGA AGA
CGG UGG GGG AGG
CGC UGC GGC AGC
CGU UGU GGU AGU
CCA UCA GCA ACA
CCG UCG GCG ACG
CCC UCC GCC ACC
CCU UCU GCU ACU
CUA UUA GUA AUA
CUG UUG GUG AUG
CUC UUC GUC AUC
CUU UUU GUU AUU
Diagram of codon genetic relationships
H
H
–
–
Y
Y
E
E
D
D
K
K
N
N
R
R
–
W
C
C
G
G
G
G
R
S
S
P
P
P
P
S
S
S
S
A
A
A
A
T
T
T
T
L
L
L
L
L
L
F
F
V
V
V
V
I
I
I
AGCU 1
3 2
Genetic relationships between Arg and Met/Gln
M
R
R
Q
R
Q
What part of the codon contains the information about the previous amino acid that occurred at certain position of the
protein sequence?
At most 2/3 of the entire codon.
AlaGCG
ValGUG
How long is the information about codons of preceeding amino acids stored?
Theoreticaly the longest period is infinite
The shortest storage period is 3 transitions/transversions
AlaGCG
ValGUG
MetAUG
IleAUA
SerUCC
SerUCU
ThrACU
SerAGU
LysAAA
AsnAAC
AspGAC
HisCAC
GlnCAG
GluGAG
AspGAU
HisCAU
AsnAAU
LysAAG
GlnCAG
HisCAC
TyrUAU . . .
Correlated mutations
The phenomenon of several mutations occurring simultaneously and dependent on each other
According to the current hypothesis of molecular positive Darwinian selection, correlated mutations are related to the changes
occurring in their neighborhood, they reflect the protein-to-protein interaction and they preserve the biological activity
and structural properties of the molecule
The current explanation of correlated mutations occurrence (example)
CH2
HN
CH2
CHCH3H3C
Trp
Leu
CH
CH3H3C
CH3
Ala
Val
CH
CH3H3C
CH2
CHCH3H3C
Val
Leu
CH2
HN
CH3
Ala
Trp
The three types of distribution of correlated positions present in myoglobins
The residue location and relative distribution is shown on tertiary structure of human myoglobin (P0244, pdb1bzp)
The spot correlation cluster
Position no. and occurring residues
Correlation versus position 127
127 [AMSTV] A (58) S (7)
27 [ADEFLNT] ADEFNT E
31 [GKRS] GKRS R
78 [AKLQ] K ALQ
109 [DEGNT] DEGT E
116 [AEHKQST] AEHKQS A
117 [AEKNQS] AEKQS E
122 [BDEN] BDEN D
The three types of distribution of correlated positions present in Bowman-Birk inhibitor family
The residue location and relative distribution is shown on tertiary structure of Bowman-Birk inhibitor from soybean (P01055)
The narrow correlation cluster
Position no. and occurring residues
Correlation versus position 13
13 [–ADFIKLMPRSTV] L (11) M (10)
A (8)
4 [–RSTVY] V –S S
5 [–KPST] K –S S
7 [AEGKP] A P P
11 [EFHIKLQRST] T EHQ S
21 [EFIKMQT] T Q EQ
The three types of distribution of correlated positions present in eglin-like proteins.
The residue location and relative distribution is shown on tertiary structure of eglin C (P01051)
Position no. and occurring residues
Correlation versus position 67
67 [–DGNT] D (8) G (9)
10 [–ELNQRST] ET LNQRS
The dispersed correlation
The three types of distribution of correlated positions present in lysozymes
The residue location and relative distribution is shown on tertiary structure of lysozyme from rat (P00697, pdb5lyz)
The dispersed correlation
Position no. and occurring residues
Correlation versus position 80
80 [GHKNR] G (7) H (31) N (16)
30 [ILMV] MV ILMV V
40 [DFKNR] DN N FKNR
The observed number and contribution of three correlation types in four different protein families
The correlation sets consist of 2 to over 20 residues
The protein
family (number of correlated
positions/set)
The correlation statistics
Total number of correlation
sets observed
Number of dispersed
sets
Number of narrow clusters
Number of undirected
clusters
Number of sets
related to active center
Eglin-like proteins (2-13) 20 7 7 6 1
Bowman-Birk proteinase
inhibitors (2-28)23 4 13 6 9
Myoglobins (2-29)
41 23 9 9 n.a.
Lysozymes (2-15) 41 25 9 7 9
All families 125 (100%) 59 (47.2%) 38 (30.4%) 28 (22.4%) -
Bowls are concave
Bowls are convex
A mathematician – biologist dialogueThe communication problem
...not always the first conclusion is correct and the first impression consistent with the reality
In entire splendour of natural phenomena...
Zestawienie sekwencji (multiple alignment) 52 inhibitorów proteinaz typu Bowman-Birk sporządzone za pomocą algorytmu
semihomologii genetycznej Reszty konserwatywne i typowe wyszczególniono białymi literami na czarnym tle. Szare tło wskazuje aminokwasy
semihomologiczne. 3 10 20 30 40 50 60 P01055 ESSKPCCDQCACTKSNPPQCRCSDMRLNSCHSACKSCICALSYPAQCF-CVDITDFCYEP-CKP P01057 ESSKPCCDECACTKSIPPQCRCTDVRLNSCHSACSSCVCTFSIPAQCV-CVDMKDFCYAP-CKS P01056 QSSKPCCBHCACTKSIPPQCRCTDLRLDSCHSACKSCICTLSIPAQCV-CBBIBDFCYEP-CKS P01058 ESSKPCCDQCSCTKSMPPKCRCSDIRLNSCHSACKSCACTYSIPAKCF-CTDINDFCYEP-CKS P01059 ESSKPCCDLCTCTKSIPPQCHCNDMRLNSCHSACKSCICALSEPAQCF-CVDTTDFCYKS-CHN P01063 ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS P17734 QSSKPCCRQCACTKSIPPQCRCSQVRLNSCHSACKSCACTFSIPAQCF-CGBIBBFCYKP-CKS P81483 -SSKPCCBHCACTKSIPPQCRCSBLRLNSCHSECKGCICTFSIPAQCI-CTDTNNFCYEP-CKS P81484 -SSKPCCBHCACTKSIPPQCRCSBLRLNSCHSECKGCICTFSIPAQCI-CTDTNNFCYEP-CKS P16343 ESSKPCCSSC-CTRSRPPQCQCTDVRLNSCHSACKSCMCTFSDPGMCS-CLDVTDFCYKP-CKS P01064 EYSKPCCDLCMCTRSMPPQCSCEDIRLNSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS P82469 -SSGPCCDRCRCTKSEPPQCQCQDVRLNSCHSACEACVCSHSMPGLCS-CLDITHFCHEP-CKS P01061 ESSHPCCDLCLCTKSIPPQCQCADIRLDSCHSACKSCMCTRSMPGQCR-CLDTHDFCHKP-CKS P01062 ESSEPCCDSCDCTKSIPPECHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCYKP-CES P01060 QSSPPCCBICVCTASIPPQCVCTBIRLBSCHSACKSCMCTRSMPGKCR-CLBTTBYCYKS-CKS 1BBI: ESSKPCCDQCACTKSNPPQCRCSDMRLNSCHSACKSCICALSYPAQCF-CVDITDFCYEP-CKP 1D6R:I ---KPCCDQCACTKSNPPQCRCSDMRLNSCHSACKSCICALSYPAQCF-CVDITDFCYEP-CK- 1DF9:C ESSEPCCDSCDCTKSIPPQCHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCYKP-CES 1PI2: EYSKPCCDLCMCTRSMPPQCSCED-RINSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS 1PBI:A DVKSACCDTCLCTKSNPPTCRCVDVGET-CHSACLSCICAYSNPPKCQ-CFDTQKFCYKQ-CHN AAB4719 ESSKPCCDQCTCTKSIPPQCRCTDVRLNSCHSACSSCVCTFSIPAQCV-CVDMKDFCYAP-CKS TISYC2 ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS JC2225 ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS TIZB2 ESSKPCCDQC-CTKSMPPKCRCSDIRLDSCHSACKSCACTYSIPAKCF-CTDINDFCYEP-CKS JC2073 ESSKPCCDECKCTKSEPPQCQCVDTRLESCHSACKLCLCALSFPAKCR-CVDTTDFCYKP-CKS JC2072 ESSKPCCDECKCTKSEPPQCQCVDTRLESCHSACKLCLCALSFPAKCR-CVDTTDFCYKP-CKS 0506164 ESSKPCCDQC-CTKSMPPKCRCSDIRLDSCHSACKSCACTYSIPAKCF-CTDINDFCYEP-CKS 0401177 ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS 763679A ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS TISYD2 EYSKPCCDLCMCTRSMPPQCSCEDIRLNSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS 0907248 ESSEPCCDSCRCTKSIPPQCHCADIRLNSCHSACKSCMCTRSMPGKCR-CLDTDDFCYKP-CES 1102213 ESSEPCCDLCLCTKSIPPQCQCADIRLNSCHSACKSCMCTRSMPGQCH-CLDTHDFCHKP-CKS 1102213 ESSEPCCDLCLCTKSIPPQCQCADIRLNSCHSACKSCMCTRSMPGQCR-CLDTHDFCHKP-CKS 0404180 EYSKPCCDLCMCTRSMPPQCSCEDIRLNSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS TIZB1B ESSHPCCDLCLCTKSIPPQCQCADIRLDSCHSACKSCMCTRSMPGQCH-CLDTHDFCHKP-CKS TIMB ESSEPCCDSCDCTKSKPPQCHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCYKP-CES TIZB1P ESSHPCCDLCLCTKSIPPQCQCADIRLNSCHSACKSCMCTRSMPGQCR-CLDTHDFCHKP-CKS JC1066 ESSEPCCDSCDCTKSKPPQCHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCTKP-CES Q41066 DVKSACCDTCLCTKSDPPTCRCVDVGET-CHSACDSCICALSYPPQCQ-CFDTHKFCYKA-CHN P80321 STTTACCDFCPCTRSIPPQCQCTDVREK-CHSACKSCLCTLSIPPQCH-CYDITDFCYPS-CR- Q41065 DVKSACCDTCLCTKSNPPTCRCVDVRET-CHSACDSCICAYSNPPKCQ-CFDTHKFCYKA-CHN P81705 --TSACCDKCFCTKSNPPICQCRDVGET-CHSACKFCICALSYPAQCH-CLDQNTFCYDK-CDS P56679 DVKSACCDTCLCTKSNPPTCRCVDVGET-CHSACLSCICAYSNPPKCQ-CFDTQKFCYKA-CHN P16346 --TTACCNFCPCTRSIPPQCRCTDIGET-CHSACKTCLCTKSIPPQCH-CADITNFCYPK-CN- P01065 DVKSACCDTCLCTRSQPPTCRCVDVGER-CHSACNHCVCNYSNPPQCQ-CFDTHKFCYKA-CHS P24661 DVKSACCDTCLCTKSEPPTCRCVDVGER-CHSACNSCVCRYSNPPKCQ-CFDTHKFCYKS-CHN P07679 KRPWECCDIAMCTRSIPPICRCVDKVDR-CSDACKDCEETEDN--RHV-CFDTYIGDPGPTCHD P19860 ERPWKCCDLQTCTKSIPAFCRCRDLLEQ-CSDACKECGKVRDSDPPRYICQDVYRGIPAPMCHE P22737 ERPWKCCDLQTCTKSIPAFCRCRDLLEQ-CSDACKECGKVRDSDPPRYICQDVYRGIPAPMCHE 220645 ES-EGCCDRCICTKSMPPQCHCHDVRLDSCHSDCETCICTRSYPAQCR-CADTTDFCYKP-C-S P09864 TRPWKCCDRAICTKSFPPMCRCMDMVEQ-CAATCKKCGPATSDSSRRV-CEDXY----------- P09863 KRPWKCCDQAVCTRSIPPICRCMDQVFE-CPSTCKACGPSVGDPSRRV-CQDQYV---------- KONSENSUS ESSKPCCDXCXCTKSIPPQCRCXDXRLNSCHSACKSCXCTRSXPXQCX-CXDTXDFCYKP-CKS
Thank you for your attention
!!!