formulationandanalysisofpatternsinascorematrixforglobal...
TRANSCRIPT
Research ArticleFormulation and Analysis of Patterns in a Score Matrix for GlobalSequence Alignment
James Owusu Asare1 Justice Kwame Appati 2 and Kwaku Darkwah1
1Department of Mathematics Kwame Nkrumah University of Science and Technology Kumasi Ghana2Department of Computer Science University of Ghana Legon Ghana
Correspondence should be addressed to Justice Kwame Appati jkappatiugedugh
Received 15 April 2020 Accepted 16 May 2020 Published 1 June 2020
Academic Editor Aloys Krieg
Copyright copy 2020 James Owusu Asare et al )is is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited
Global sequence alignment is one of the most basic pairwise sequence alignment procedures used in molecular biology tounderstand the similarity that arises among the structure function or evolutionary relationship between two nucleotide se-quences )e general algorithm associated with global sequence alignment is the dynamic programming algorithm of NeedlemanandWunsch In this paper patterns are exploited in the score matrix of the NeedlemanndashWunsch algorithmWith the help of someexamples the general patterns realized are formulated as new a priori propositions and corollaries that are established for bothequal and unequal length comparisons of any two arbitrary sequences
1 Introduction
Sequence alignment is the matching of strings or sequencesof characters to identify patterns that may lead to informedstructural or functional relationships between the strings orsequences matched Varying situational problems seem torequire the use of sequence alignment and examples aboundfrom computer science to molecular biology where se-quences are usually aligned to make more meaning out ofthem Bare [1] has discussed how researchers have over theyears considered genetic sequences as strings of charactersinstead of focusing on their intrinsic properties to be able tocompute the similarity among two or more related se-quences Global and local sequence alignment are used formatching two sequences and they are categorized as pair-wise sequence alignment or multiple sequence alignmentwhen any alignment matches more than two sequences[2ndash4] )e NeedlemanndashWunsch algorithm [5ndash7] and theSmithndashWaterman algorithm [8] are the general algorithmsassociated with pairwise global and local alignment re-spectively )ese two algorithms utilize a procedure calledthe dynamic programming approach Dynamic program-ming as coined by Bellman in the 1940s is simply the process
of solving a bigger problem by finding optimal solutions toits smaller nested problems [9ndash11])us to tackle a problemin the context of dynamic programming it must possess thenotion of recurrence In [12 13] an algorithm is said to be awell-defined computational task that accepts input andproduces output values after following through a systematicmethod Mathematical theory is thus a prerequisite behindthe designing of functional programs [14 15] and the al-gorithm design specializes in solving such problems Globalsequence alignment is mentioned as one of the vast dynamicprogramming applications in practical problems [16ndash18]
More recently Ouzounis and Baichoo [19] have statedthat even though pairwise sequence alignment has been dealtwith over time concerns still remain in resolving exactevolutionary distances that demand very specific estimates)ey have suggested the existence of theoretical relation-ships within alignments algorithms and data that are yet tobe found Motivated by the display of position-dependentarrays for affine gaps using the NeedlemanndashWunsch algo-rithm in [20 21] we consider the more basic linear gappenalties using arbitrary sequences and proceed to findpatterns in the score matrix which were absent in thepresentation )is approach was taken because it is found
HindawiInternational Journal of Mathematics and Mathematical SciencesVolume 2020 Article ID 3858057 9 pageshttpsdoiorg10115520203858057
missing in the available stream of literature although it canbe likened to the edit graph illustration seen in [1] To thebest of our knowledge there has never been any theoreticalexposition of this kind for the most basic and very funda-mental concept of constant linear gap penalties whichconstitute an implicit part of affine gap penalties )us wefocus on finding patterns in global sequence alignment usingconstant linear gap penalties and attempt the display ofcompleted score matrices of the NeedlemanndashWunsch al-gorithm with distinct positional arrays that can be inspectedWe give confirmation of this by basic proofs and suggesthow predictions can be made To offer the reader the muchneeded convenience concepts that are deemed useful arerecalled in brief with corresponding references of compre-hensive works that give more in-depth information
2 NeedlemanndashWunsch Algorithm
Let the recursive formulation for the NeedlemanndashWunschalgorithm be
(1) Initialization
Let P(0 0) 0Pm(i 0) Pm(i minus 1 0) + gap penalty foralli 1 μ1andPm(0 j) Pm(0 j minus1) + gap penalty forallj 1 μ2where P(0 0) is the initialization pivot for the scorematrix Pm(i 0) is the initial row pivot for the scorematrix and Pm(0 j) is the initial column pivot forthe score matrix
(2) Cell Box Calculation
Let
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
(1)
where Pm(i j) is the pivot for each cell box calculationPD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j)) is the diagonalvalue of a cell box PR(i j) Pm(i minus 1 j) + gap penalty is theright value of a cell box PL(i j) Pm(i j minus 1) + gap penaltyis the left value of a cell box and σ(α(i) β(j)) is the score foraligning the sequence characters of α(i) and β(j)
Specifically the procedure for the NeedlemanndashWunschalgorithm follows the dynamic programming approach ofthe score matrix traceback and alignment as outlined in thesubsequent sections
21 Score Matrix )e score matrix is a tabular box con-structed to keep count of score results )e score matrix forthe NeedlemanndashWunsch algorithm begins with an initiali-zation process and ends with the calculation of cell boxes
211 Initialization A sequence matrix is created withμ1 + 1 columns and μ2 + 1 rows in order for the initialmatrix gap to be aligned where μ1 and μ2 are the lengths
of arbitrary sequences langαrang and langβrang respectively )eletters of the sequence α fill in the horizontal axis andsimilarly the characters of langβrang fill in the vertical axis ofthe sequence matrix created Before the scoring beginsfrom the upper left corner of the initialized matrix to thelower right corner of the matrix the value P(0 0) 0 isassigned to the intersection of the first row and the firstcolumn of the matrix (ie the initial gap) )e reason forthe gap penalty for an alignment is because of thepossibility of mutation which may insert or delete astring character from one of the sequences Arrows thatpoint in the direction of positional movement (diagonaland left or right) are placed in each cell box of the matrixand only terminate when all the cell boxes are completelyfilled
212 Calculation of Cell Boxes in the Score Matrix
(1) Fill the initialized gap values first on both the hor-izontal axis and the vertical axis with a definedconstant gap value score
(2) Follow with the calculation of each cell box havingthe three position-dependent arrays (leftbesiderightbottom or diagonal)
(3) Allow only a matchmismatch value for a ldquodiagonalrdquoposition and allow the ldquobottomrdquo or ldquobesiderdquo posi-tions to take linear gap values only
(4) For each computed cell box values find the maxi-mum score and let that be the pivot
(5) )e pivot of a computed cell box directly affects thenext cell boxes in the row or the column
22 Traceback A simpler score matrix table that containsonly the pivot of each cell box calculation is constructedfrom the original position-dependent arrays of the scorematrix table Arrow pointers are used to direct a path fromthe highest score or an optimal value in the matrix (whichactually occurs at the lower right end corner of the matrix)and traced back to the next biggest value of the predecessorsuntil we reach the intersection of the first row and thecolumn with the initial gap
23 Alignment
231 Alignment Generation from a Traceback To write thesequence characters that appear from the optimal alignmentpath of the traceback stage the following steps are followed
(1) When the arrow is diagonal write both characters inthe alignment
(2) When the arrow is vertical write the correspondinghorizontal character and in place of the verticalcharacter leave a gap
(3) When the arrow is horizontal write the corre-sponding vertical character and in place of thehorizontal character leave a gap
2 International Journal of Mathematics and Mathematical Sciences
)us for both the vertical and horizontal arrow posi-tions one character and a gap are written for the alignmentwhere the gap explicitly replaces a character position in thealignment An alignment can only be inferred as the best ifthe optimal value from the score matrix table corresponds tothe alignment score calculation (based on the scoringscheme defined)
232 0e Problem of Aligning Any Two Sequences )eproblem of aligning any two sequences can be simplified asdiscovering the optimal means of aligning any two arbitrarysequences say langαrang and langβrang such that the character ldquominusrdquo notedas a gap is filled into both langαrang and langβrang or either of themwhere
(1) Any single character in langαrang matches a single char-acter in langβrang or a gap
(2) )e final sum of scores from the scoring functionover the aligned pairs and the gap penalties as givenby the function of gap penalty is maximized
233 Letter Choice for Arbitrary Sequences )e letterchoice of this study shall be that of DNA considered as astring of four characters of adeninemdashA guaninemdashGcytosinemdashC and thyminemdashT
3 Scheme for Scoring
Definition 1 )e constant gap penalty v(k) g times k whereg is an assigned constant and k is the sequence length countof i 1 μ1 for the score matrix row and j 1 μ2 forthe score matrix column)is is the penalty awarded to gapsand is also known as the linear gap function
Definition 2 )e affine gap penalty is the penalty awarded togaps where a greatest consecutive sequence of k gaps is givenas v(k) q + gk where q is the penalty charged for openingthe gap and g is the penalty charged for extending it
)us more formally we define
(1) A constant linear gap as
v(k) gk (2)
(2) An affine gap as
v(k) q + gk kge 1
0 k 01113896 (3)
Altschulrsquos theory assign positive scores to an identityand conserved replacements and assign negative scores toless likely replacements
Remark 1 Needleman and Wunsch use the ldquoidentity ma-trixrdquo in scoring with 1 for a match and 0 for a mismatchNeedlemanndashWunschrsquos score was criticised for not reflecting
observations from the nature because purine-purine orpyrimidine-pyrimidine is less prone to be mutated incomparison with mutations of purine-pyrimidine Becausethere is no definite defined score for DNA alignment thechoice of scoring for this study is +5 for a match minus1 for amismatch and minus2 for a gap as proposed in [18] following theabove theory
More formally let
σ(α(i) β(j)) +5 for α(i) β(j) charactermatch
minus1 for α(i) β(j) charactermismatch1113896
(4)
foralli 1 μ1 andforallj 1 μ2Also let the linear gap penalty v(k) gk where g is a
constant of ldquominus2rdquo and k is the sequence length count of i
1 μ1 for the score matrix row and j 1 μ2 for thescore matrix column
4 Results and Discussion
41 Pattern Investigations in the Score Matrix of Needle-manndashWunschAlgorithm )e sequences α and β that will beused are arbitrary sample sequences that were chosen basedon consideration of the following
(1) μ1 μ2 (equal character length of sequences)(2) μ1 ne μ2 (unequal character length of sequences)
411 Equal Character Length (μ1 μ2 μ)
Example 1 Let langαrang CTTGA and langβrang CTAGA Becausethe length of the sequence characters of langαrang and langβrang isμ1 μ2 5 respectively we align α and β in a score matrixfollowing the above formulation
(1) Initialization P(0 0) 0 for score matrix initiali-zation and then for the initial row pivot Pm(i 0)
Pm(i minus 1 0) + gap penalty where forallk i we find thegap penalty as follows
for i 1 Pm(1 0) P(0 0) + gap penalty
⟹ 0 +(minus2) minus2
for i 2 Pm(2 0) Pm(1 0) + gap penalty
⟹ minus 2 +(minus2) minus4
for i 3 Pm(3 0) Pm(2 0) + gap penalty
⟹ minus 4 +(minus2) minus6
for i 4 Pm(4 0) Pm(3 0) + gap penalty
⟹ minus 6 +(minus2) minus8
for i 5 Pm(5 0) Pm(4 0) + gap penalty
⟹ minus 8 +(minus2) minus10
(5)
and for the initial column pivot Pm(0 j) Pm(0 j minus 1) +gap penalty forallk j the gap penalty is obtained as follows
International Journal of Mathematics and Mathematical Sciences 3
for j 1 Pm(0 1) P(0 0) + gap penalty
⟹ 0 +(minus2) minus2
for j 2 Pm(0 2) Pm(0 1) + gap penalty
⟹ minus 2 +(minus2) minus4
for j 3 Pm(0 3) Pm(0 2) + gap penalty
⟹ minus 4 +(minus2) minus6
for j 4 Pm(0 4) Pm(0 3) + gap penalty
⟹ minus 6 +(minus2) minus8
for j 5 Pm(0 5) Pm(0 4) + gap penalty
⟹ minus 8 +(minus2) minus10
(6)
Before following with further calculation some iden-tification is placed on each cell box for easy noting asshown in Table 1
(2) Cell box calculations
Pm(1 1) max
PD(1 1) P(0 0) + σ(C C) 0 + 5 5
PR(1 1) Pm(0 1) + gap minus2 +(minus2) minus4
PL(1 1) Pm(1 0) + gap minus2 +(minus2) minus4
⎧⎪⎪⎨
⎪⎪⎩
(7)
Remark 2 σ(C C) in PD(1 1) was ldquo5rdquo because that is theassigned score for matched characters Pm(1 1) maxPD(1 1) PR(1 1) PL(1 1)1113864 1113865 max 5 minus4 minus41113858 1113859 5
Pm(1 2) max
PD(1 2) Pm(0 1) + σ(C T) minus2 +(minus1) minus3
PR(1 2) Pm(0 2) + gap minus4 +(minus2) minus6
PL(1 2) Pm(1 1) + gap 5 +(minus2) 3
⎧⎪⎪⎨
⎪⎪⎩
(8)
Remark 3 σ(C T) in PD(1 2) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(12)
max PD1113864 (12)PR(12)PL(12) max minus3 minus6 31113858 1113859 3
Pm(1 3) max
PD(1 3) Pm(0 2) + σ(C A) minus4 +(minus1) minus5
PR(1 3) Pm(0 3) + gap minus6 +(minus2) minus8
PL(1 3) Pm(1 2) + gap 3 +(minus2) 1
⎧⎪⎪⎨
⎪⎪⎩
(9)
Remark 4 σ(C A) in PD(1 3) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(1 3)
max PD1113864 (1 3) PR(1 3) PL(1 3) max minus5 minus8 11113858 1113859 1Consequently the recursion follows similarly until all
the cell boxes are filled In the event of a pivot tie in a cell boxone tie value is picked For the tabular display these no-tations are used interchangeably PD(i j) is the same as DVPL(i j) is the same as LV PR(i j) is the same as RV andPm(i j) is the same as the pivot
412 Pattern Results for Equal Character Length Withreference to Tables 1 and 2 and Figure 1
(1) )e filled-in cell boxes for column 1 ie cell boxes(i vi xi xvi and xxi) have values coinciding withthat of row 1 ie cell boxes (i ii iii iv and v)
(2) )e leading value (pivot) of each cell box in both row1 and column 1 remains the same
(3) )e bottom value in column 1 switches to becomethe beside value in row 1 and vice versa
(4) For each 3 pointed arrow intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box
(5) For each cell box the value of the preceding bottomvalue is less than the value of the immediate nextadjacent diagonal value
413 Traceback and Alignment for Equal Character Length)e traceback and alignment stages for equal characterlength of sequences are respectively shown in this section
Figure 2 shows the pivot of each cell box calculation ofthe score matrix table of CTTGA and CTAGA in Figure 1)us Figure 2 is a simpler score matrix table constructedfrom the original score matrix table of CTTGA and CTAGAin Figure 1 )e arrow pointers direct the path from theoptimal value and traceback to the initialization value ofzero Based on the traceback and the diagonal direction ofthe arrow pointers the alignment is written as
CTTGA
CTAGA(10)
To confirm the correctness of the alignment done wecheck using calculations Recall the scoring schemematch+5 mismatch minus1 and gap minus2 We have fouralignment matches C minus C T minus T G minus G and A minus A and onemismatched alignment T minus A Hence
5 + 5 minus 1 + 5 + 5 19 (11)
which is the same as the optimal value from the score matrixtable )e alignment is thus optimal
414 Unequal Character Length (μ1 ne μ2)
Example 2 Suppose langαrang AGCTG and langβrang TCAG then tofill in the score matrix values of the cell boxes the prior-stated recursive formulation is used )is results in Figures 3and 4
Table 1 Identifying cell boxes of CTTGA and CTAGA
C T T G A0 minus2 minus4 minus6 minus8 minus10
C minus2 i ii iii iv vT minus4 vi vii viii ix xA minus6 xi xii xiii xiv xvG minus8 xvi xvii xviii xix xxA minus10 xxi xxii xxiii xxiv xxv
4 International Journal of Mathematics and Mathematical Sciences
415 Pattern Results for Unequal Character Length
(1) )e filled-in cell boxes for column 1 ie(i vi xi and xvi) cell boxes have consistent valueswith that of row 1 ie (i ii iii and iv) up untilwhere they terminate at (v) and at the inconsistent(iii)
(2) Except for where an inconsistency is noted at (v) and(iii) the pivot values for each cell box of row 1 andcolumn 1 remain the same
(3) For each 3 pointed arrows intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box
GTC T A
C
T
A
A
G
0
9
6
ndash2 ndash4
3 1
3
ndash3
ndash4
ndash6
ndash6 ndash8 ndash10
ndash15
ndash1
8 11
ndash8
ndash10
ndash2
10 4
1 7
8
1476 12
ndash3 1254 19
Figure 2 Traceback in the score matrix of CTTGA and CTAGA
Table 2 Filled-in table of CTTGA and CTAGAi DV 5 LV minus4 RV minus4 Pivot 5ii DV minus3 LV 3 RV minus6 Pivot 3iii DV minus5 LV 1 RV minus8 Pivot 1iv DV minus7 LV minus1 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV 3 Pivot 3vii DV 10 LV 1 RV 1 Pivot 10viii DV 8 LV 8 RV minus1 Pivot 8ix DV 0 LV 6 RV minus3 Pivot 6x DV minus2 LV 4 RV minus5 Pivot 4xi DV minus5 LV minus8 RV 1 Pivot 1xii DV 2 LV minus1 RV 8 Pivot 8xiii DV 9 LV 6 RV 6 Pivot 9xiv DV 7 LV 7 RV 4 Pivot 7xv DV 11 LV 5 RV 2 Pivot 11xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 0 LV minus3 RV 6 Pivot 6xviii DV 7 LV 4 RV 7 Pivot 7xix DV 14 LV 5 RV 5 Pivot 14xx DV 6 LV 12 RV 9 Pivot 12xxi DV minus9 LV minus12 RV minus3 Pivot minus3xxii DV minus2 LV minus5 RV 4 Pivot 4xxiii DV 5 LV 2 RV 5 Pivot 5xxiv DV 6 LV 3 RV 12 Pivot 12xxv DV 19 LV 10 RV 10 Pivot 19
GTC T A
C
T
A
A
G
0 ndash2 ndash4
ndash4
ndash6
ndash6 ndash8 ndash10
ndash8
ndash10
ndash25 ndash4 ndash3 ndash6 ndash5 ndash8 ndash7 ndash10 ndash12ndash9
ndash4 ndash3ndash113
3ndash3 10 1 8 ndash1 0 ndash3 ndash2 ndash54
4
6
6
8
98
1
1 2
ndash6
ndash5
ndash8 ndash1ndash1ndash7
ndash10 ndash3ndash3
66 6
7
77
7
5
5
11
912
12 1954
ndash9
ndash12 ndash5ndash2 4 5 5
2 3
6
1010
140
2
Figure 1 Score matrix of the cell boxes of CTTGA and CTAGA
T
C
A
G
A G C T G
0
ndash2
ndash2
ndash4ndash4
ndash4 ndash4ndash4
ndash4
ndash4
ndash4
ndash6
ndash6
ndash6
ndash6
ndash8
ndash8
ndash10
ndash8
ndash8
ndash10
ndash10 ndash9ndash1 ndash12ndash1 ndash3
ndash3ndash3
ndash5
ndash5ndash5
ndash5
ndash5
ndash2 ndash22
2
ndash2
ndash2
ndash2
ndash2ndash2
ndash5
ndash7
ndash7
ndash7 ndash1
ndash1 ndash1
ndash1 ndash1
ndash1
ndash6
6 6
ndash3
ndash3
ndash3
ndash3
ndash3ndash3 ndash3
ndash3
0
0
04
11
Figure 3 Score matrix of AGCTG and TCAG
T
T
C
C
A
A
G
G G
0 ndash10
ndash8
ndash8
ndash6
ndash6
ndash4
ndash4
ndash2
ndash2
i ii iii iv v
vi vii vii ix x
x xii xiii xiv xv
xvi xvii xviii xix xx
Figure 4 Labelling of the cell boxes of AGCTG and TCAG
International Journal of Mathematics and Mathematical Sciences 5
(4) For each cell box the value of the preceding bottomvalue is seen to be always less than the value of theimmediate next adjacent diagonal value
416 Traceback and Alignment for Unequal CharacterLength )e traceback and alignment stages for the exampleon unequal character length of sequences are respectivelyshown below Figure 5 shows the pivot of each cell boxcalculation of the score matrix table of AGCTG and TCAGof Table 3 Based on the traceback and the direction of thearrow pointers the alignment is written as
AGCTG
T minus CAG(12)
We check the correctness of the alignment done usingcalculations Recall the scoring scheme match+5 mis-match minus1 and gap minus2 We have two alignment matchesG minus G and C minus C two alignment mismatches A T and T Aand one gapped alignment G minus Hence
minus1 minus 2 + 5 minus 1 + 5 6 (13)
which is the same as the optimal value from the score matrixtable )e alignment is thus optimal
42 Propositions and Proofs for Equal Sequence Length)e following propositions are a priori results deduced fromthe pattern results of equal character length of sequences
Definition 3 Let the linear gap penalty be v(k) gk whereg is a constant of minus2 and k is the sequence length count ofk 1 μ1 for the score matrix row and k 1 μ2 forthe score matrix column
Definition 4 Define P(0 0) 0 to be the initialization pivotfor the score matrix
Definition 5 Define Pm(i 0) Pm(i minus 1 0) + gap penaltyforalli 1 μ1 to be the initial row pivot for the score matrix
Definition 6 Define Pm(0 j) Pm(0 j minus 1) + gap penaltyforallj 1 μ2 to be the initial column pivot for the scorematrix
Definition 7 Define
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
(14)
foralli 1 μ1 and forallj 1 μ2 where Pm(i j) is the pivotfor each cell box calculation PD(i j) Pm(i minus 1 j minus 1) +
σ(α(i) β(j)) is the diagonal value of a cell box PR(i j)
Pm(i minus 1 j)+ gap penalty is the right value of a cell boxPL(i j) Pm(i j minus 1)+ gap penalty is the left value of a cellbox and σ(α(i) β(j)) is the score for aligning the sequencecharacters of α(i) and β(j)
Definition 8 Define
σ(α(i) β(j)) +5 for α(i) β(j) charactermatch
minus1 for α(i) β(j) charactermismatch1113896
(15)foralli 1 μ1 and forallj 1 μ2
Proposition 2 Filled-in cell boxes for Pm(1 j)forallj
1 μ2 are analogous to filled-in cell boxes for Pm(i 1)foralli
1 μ1 for equal length comparison of sequences
Proof By Definition 7 we have
Pm(1 j) max
PD(1 j) Pm(0 j minus 1) + σ(α(1) β(j))
PR(1 j) Pm(0 j) + gap
PL(1 j) Pm(1 j minus 1) + gap
⎧⎪⎪⎨
⎪⎪⎩
Pm(i 1) max
PD(i 1) Pm(i minus 1 0) + σ(α(i) β(1))
PR(i 1) Pm(i minus 1 1) + gap
PL(i 1) Pm(i 0) + gap
⎧⎪⎪⎨
⎪⎪⎩
(16)
TCA G G
T
C
A
G
0
0
0
ndash2 ndash4
ndash3 ndash5
ndash3
ndash3
ndash4
ndash6
ndash6 ndash8 ndash10
ndash1ndash1
ndash1
ndash1 ndash1
ndash8
ndash2
ndash2 ndash2
1 1
2
246 6
Figure 5 Traceback in the score matrix of AGCTG and TCAG
Table 3 Filled-in table of AGCTG and TCAGi DV minus1 LV minus4 RV minus4 Pivot minus1ii DV minus3 LV minus3 RV minus6 Pivot minus3iii DV minus5 LV minus5 RV minus8 Pivot minus5iv DV minus1 LV minus7 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV minus3 Pivot minus3vii DV minus2 LV minus5 RV minus5 Pivot minus2viii DV 2 LV minus4 RV minus7 Pivot 2ix DV minus6 LV 0 RV minus3 Pivot 0x DV minus2 LV minus2 RV minus5 Pivot minus2xi DV 1 LV minus8 RV minus5 Pivot 1xii DV minus4 LV minus1 RV minus4 Pivot minus1xiii DV minus3 LV minus3 RV 0 Pivot 0xiv DV 1 LV minus2 RV minus2 Pivot minus2xv DV minus1 LV minus1 RV minus4 Pivot minus1xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 6 LV minus3 RV minus3 Pivot 6xviii DV minus2 LV 4 RV minus2 Pivot 4xix DV minus1 LV 2 RV minus1 Pivot 2xx DV 6 LV 0 RV minus3 Pivot 6
6 International Journal of Mathematics and Mathematical Sciences
foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6
Pm(i 0) Pm(i minus 1 0) + gap
⟹PL(i 1) Pm(i 0) + gap
Pm(i minus 1 0) + gap( 1113857 + gap
(17)
also
Pm(0 j) Pm(0 j minus 1) + gap
⟹PR(1 j) Pm(0 j) + gap
Pm(0 j minus 1) + gap( 1113857 + gap
(18)
Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4Pm(i 0) Pm(0 j)
(19)
Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof
Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)
Pm(i minus 1 1) + gap and by induction let i 1 j then
PL(1 j) Pm(1 0) + gap
P(0 0) + gap + gap
P(0 0) + 2 gaps
PR(i 1) Pm(0 1) + gap
P(0 0) + gap + gap
P(0 0) + 2gaps
(20)
Assume Pm(k 0) Pm(0 k) then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4PR(i 1) PL(1 j) where i j
(21)
which completes the second part of our proofRecall
PL(i 1) Pm(i minus 1 0) + 2 gaps
PR(1 j) Pm(0 j minus 1) + 2 gaps
PL(i 1) PR(1 j)was established
(22)
It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)
Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))
For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)
and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us
σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)
Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)
Pm(0 j minus 1)
Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa
Proof From Proposition 2 and Definitions 5ndash7 it is clearthat
PL(i 1) PR(1 j)
PR(i 1) PL(1 j)
where i j
(24)
Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences
Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2
Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1
(25)
and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved
Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7
International Journal of Mathematics and Mathematical Sciences 7
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j) max
PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))
PR(i + 1 j) Pm(i j) + gap penalty
PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i j + 1) max
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))
PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j + 1) max
PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))
PR(i + 1 j + 1) Pm(i j + 1) + gap penalty
PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
PR(i + 1 j) Pm(i j) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
(26)
Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write
PR(i j) Pm(i minus 1 j) + gap penalty
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)
Weare left to show that gap penalty lt σ(α(i) β(j+ 1))
Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match
For any k count forallk 1 μ the linear gap penalty gk
decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us
gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score
match diagonal score
foralli j 1 μ
(28)
Hence PR(i j)ltPD(i j + 1)foralli j 1 μ
43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences
Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1
j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof
Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof
8 International Journal of Mathematics and Mathematical Sciences
5 Conclusion
In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs
Data Availability
No data were used to support this study
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
References
[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005
[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003
[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007
[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005
[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970
[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008
[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005
[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981
[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004
[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002
[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957
[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004
[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018
[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication
company using prometheerdquo Applied Informatics vol 5 no 12018
[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019
[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010
[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003
[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012
[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017
[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009
[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986
International Journal of Mathematics and Mathematical Sciences 9
missing in the available stream of literature although it canbe likened to the edit graph illustration seen in [1] To thebest of our knowledge there has never been any theoreticalexposition of this kind for the most basic and very funda-mental concept of constant linear gap penalties whichconstitute an implicit part of affine gap penalties )us wefocus on finding patterns in global sequence alignment usingconstant linear gap penalties and attempt the display ofcompleted score matrices of the NeedlemanndashWunsch al-gorithm with distinct positional arrays that can be inspectedWe give confirmation of this by basic proofs and suggesthow predictions can be made To offer the reader the muchneeded convenience concepts that are deemed useful arerecalled in brief with corresponding references of compre-hensive works that give more in-depth information
2 NeedlemanndashWunsch Algorithm
Let the recursive formulation for the NeedlemanndashWunschalgorithm be
(1) Initialization
Let P(0 0) 0Pm(i 0) Pm(i minus 1 0) + gap penalty foralli 1 μ1andPm(0 j) Pm(0 j minus1) + gap penalty forallj 1 μ2where P(0 0) is the initialization pivot for the scorematrix Pm(i 0) is the initial row pivot for the scorematrix and Pm(0 j) is the initial column pivot forthe score matrix
(2) Cell Box Calculation
Let
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
(1)
where Pm(i j) is the pivot for each cell box calculationPD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j)) is the diagonalvalue of a cell box PR(i j) Pm(i minus 1 j) + gap penalty is theright value of a cell box PL(i j) Pm(i j minus 1) + gap penaltyis the left value of a cell box and σ(α(i) β(j)) is the score foraligning the sequence characters of α(i) and β(j)
Specifically the procedure for the NeedlemanndashWunschalgorithm follows the dynamic programming approach ofthe score matrix traceback and alignment as outlined in thesubsequent sections
21 Score Matrix )e score matrix is a tabular box con-structed to keep count of score results )e score matrix forthe NeedlemanndashWunsch algorithm begins with an initiali-zation process and ends with the calculation of cell boxes
211 Initialization A sequence matrix is created withμ1 + 1 columns and μ2 + 1 rows in order for the initialmatrix gap to be aligned where μ1 and μ2 are the lengths
of arbitrary sequences langαrang and langβrang respectively )eletters of the sequence α fill in the horizontal axis andsimilarly the characters of langβrang fill in the vertical axis ofthe sequence matrix created Before the scoring beginsfrom the upper left corner of the initialized matrix to thelower right corner of the matrix the value P(0 0) 0 isassigned to the intersection of the first row and the firstcolumn of the matrix (ie the initial gap) )e reason forthe gap penalty for an alignment is because of thepossibility of mutation which may insert or delete astring character from one of the sequences Arrows thatpoint in the direction of positional movement (diagonaland left or right) are placed in each cell box of the matrixand only terminate when all the cell boxes are completelyfilled
212 Calculation of Cell Boxes in the Score Matrix
(1) Fill the initialized gap values first on both the hor-izontal axis and the vertical axis with a definedconstant gap value score
(2) Follow with the calculation of each cell box havingthe three position-dependent arrays (leftbesiderightbottom or diagonal)
(3) Allow only a matchmismatch value for a ldquodiagonalrdquoposition and allow the ldquobottomrdquo or ldquobesiderdquo posi-tions to take linear gap values only
(4) For each computed cell box values find the maxi-mum score and let that be the pivot
(5) )e pivot of a computed cell box directly affects thenext cell boxes in the row or the column
22 Traceback A simpler score matrix table that containsonly the pivot of each cell box calculation is constructedfrom the original position-dependent arrays of the scorematrix table Arrow pointers are used to direct a path fromthe highest score or an optimal value in the matrix (whichactually occurs at the lower right end corner of the matrix)and traced back to the next biggest value of the predecessorsuntil we reach the intersection of the first row and thecolumn with the initial gap
23 Alignment
231 Alignment Generation from a Traceback To write thesequence characters that appear from the optimal alignmentpath of the traceback stage the following steps are followed
(1) When the arrow is diagonal write both characters inthe alignment
(2) When the arrow is vertical write the correspondinghorizontal character and in place of the verticalcharacter leave a gap
(3) When the arrow is horizontal write the corre-sponding vertical character and in place of thehorizontal character leave a gap
2 International Journal of Mathematics and Mathematical Sciences
)us for both the vertical and horizontal arrow posi-tions one character and a gap are written for the alignmentwhere the gap explicitly replaces a character position in thealignment An alignment can only be inferred as the best ifthe optimal value from the score matrix table corresponds tothe alignment score calculation (based on the scoringscheme defined)
232 0e Problem of Aligning Any Two Sequences )eproblem of aligning any two sequences can be simplified asdiscovering the optimal means of aligning any two arbitrarysequences say langαrang and langβrang such that the character ldquominusrdquo notedas a gap is filled into both langαrang and langβrang or either of themwhere
(1) Any single character in langαrang matches a single char-acter in langβrang or a gap
(2) )e final sum of scores from the scoring functionover the aligned pairs and the gap penalties as givenby the function of gap penalty is maximized
233 Letter Choice for Arbitrary Sequences )e letterchoice of this study shall be that of DNA considered as astring of four characters of adeninemdashA guaninemdashGcytosinemdashC and thyminemdashT
3 Scheme for Scoring
Definition 1 )e constant gap penalty v(k) g times k whereg is an assigned constant and k is the sequence length countof i 1 μ1 for the score matrix row and j 1 μ2 forthe score matrix column)is is the penalty awarded to gapsand is also known as the linear gap function
Definition 2 )e affine gap penalty is the penalty awarded togaps where a greatest consecutive sequence of k gaps is givenas v(k) q + gk where q is the penalty charged for openingthe gap and g is the penalty charged for extending it
)us more formally we define
(1) A constant linear gap as
v(k) gk (2)
(2) An affine gap as
v(k) q + gk kge 1
0 k 01113896 (3)
Altschulrsquos theory assign positive scores to an identityand conserved replacements and assign negative scores toless likely replacements
Remark 1 Needleman and Wunsch use the ldquoidentity ma-trixrdquo in scoring with 1 for a match and 0 for a mismatchNeedlemanndashWunschrsquos score was criticised for not reflecting
observations from the nature because purine-purine orpyrimidine-pyrimidine is less prone to be mutated incomparison with mutations of purine-pyrimidine Becausethere is no definite defined score for DNA alignment thechoice of scoring for this study is +5 for a match minus1 for amismatch and minus2 for a gap as proposed in [18] following theabove theory
More formally let
σ(α(i) β(j)) +5 for α(i) β(j) charactermatch
minus1 for α(i) β(j) charactermismatch1113896
(4)
foralli 1 μ1 andforallj 1 μ2Also let the linear gap penalty v(k) gk where g is a
constant of ldquominus2rdquo and k is the sequence length count of i
1 μ1 for the score matrix row and j 1 μ2 for thescore matrix column
4 Results and Discussion
41 Pattern Investigations in the Score Matrix of Needle-manndashWunschAlgorithm )e sequences α and β that will beused are arbitrary sample sequences that were chosen basedon consideration of the following
(1) μ1 μ2 (equal character length of sequences)(2) μ1 ne μ2 (unequal character length of sequences)
411 Equal Character Length (μ1 μ2 μ)
Example 1 Let langαrang CTTGA and langβrang CTAGA Becausethe length of the sequence characters of langαrang and langβrang isμ1 μ2 5 respectively we align α and β in a score matrixfollowing the above formulation
(1) Initialization P(0 0) 0 for score matrix initiali-zation and then for the initial row pivot Pm(i 0)
Pm(i minus 1 0) + gap penalty where forallk i we find thegap penalty as follows
for i 1 Pm(1 0) P(0 0) + gap penalty
⟹ 0 +(minus2) minus2
for i 2 Pm(2 0) Pm(1 0) + gap penalty
⟹ minus 2 +(minus2) minus4
for i 3 Pm(3 0) Pm(2 0) + gap penalty
⟹ minus 4 +(minus2) minus6
for i 4 Pm(4 0) Pm(3 0) + gap penalty
⟹ minus 6 +(minus2) minus8
for i 5 Pm(5 0) Pm(4 0) + gap penalty
⟹ minus 8 +(minus2) minus10
(5)
and for the initial column pivot Pm(0 j) Pm(0 j minus 1) +gap penalty forallk j the gap penalty is obtained as follows
International Journal of Mathematics and Mathematical Sciences 3
for j 1 Pm(0 1) P(0 0) + gap penalty
⟹ 0 +(minus2) minus2
for j 2 Pm(0 2) Pm(0 1) + gap penalty
⟹ minus 2 +(minus2) minus4
for j 3 Pm(0 3) Pm(0 2) + gap penalty
⟹ minus 4 +(minus2) minus6
for j 4 Pm(0 4) Pm(0 3) + gap penalty
⟹ minus 6 +(minus2) minus8
for j 5 Pm(0 5) Pm(0 4) + gap penalty
⟹ minus 8 +(minus2) minus10
(6)
Before following with further calculation some iden-tification is placed on each cell box for easy noting asshown in Table 1
(2) Cell box calculations
Pm(1 1) max
PD(1 1) P(0 0) + σ(C C) 0 + 5 5
PR(1 1) Pm(0 1) + gap minus2 +(minus2) minus4
PL(1 1) Pm(1 0) + gap minus2 +(minus2) minus4
⎧⎪⎪⎨
⎪⎪⎩
(7)
Remark 2 σ(C C) in PD(1 1) was ldquo5rdquo because that is theassigned score for matched characters Pm(1 1) maxPD(1 1) PR(1 1) PL(1 1)1113864 1113865 max 5 minus4 minus41113858 1113859 5
Pm(1 2) max
PD(1 2) Pm(0 1) + σ(C T) minus2 +(minus1) minus3
PR(1 2) Pm(0 2) + gap minus4 +(minus2) minus6
PL(1 2) Pm(1 1) + gap 5 +(minus2) 3
⎧⎪⎪⎨
⎪⎪⎩
(8)
Remark 3 σ(C T) in PD(1 2) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(12)
max PD1113864 (12)PR(12)PL(12) max minus3 minus6 31113858 1113859 3
Pm(1 3) max
PD(1 3) Pm(0 2) + σ(C A) minus4 +(minus1) minus5
PR(1 3) Pm(0 3) + gap minus6 +(minus2) minus8
PL(1 3) Pm(1 2) + gap 3 +(minus2) 1
⎧⎪⎪⎨
⎪⎪⎩
(9)
Remark 4 σ(C A) in PD(1 3) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(1 3)
max PD1113864 (1 3) PR(1 3) PL(1 3) max minus5 minus8 11113858 1113859 1Consequently the recursion follows similarly until all
the cell boxes are filled In the event of a pivot tie in a cell boxone tie value is picked For the tabular display these no-tations are used interchangeably PD(i j) is the same as DVPL(i j) is the same as LV PR(i j) is the same as RV andPm(i j) is the same as the pivot
412 Pattern Results for Equal Character Length Withreference to Tables 1 and 2 and Figure 1
(1) )e filled-in cell boxes for column 1 ie cell boxes(i vi xi xvi and xxi) have values coinciding withthat of row 1 ie cell boxes (i ii iii iv and v)
(2) )e leading value (pivot) of each cell box in both row1 and column 1 remains the same
(3) )e bottom value in column 1 switches to becomethe beside value in row 1 and vice versa
(4) For each 3 pointed arrow intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box
(5) For each cell box the value of the preceding bottomvalue is less than the value of the immediate nextadjacent diagonal value
413 Traceback and Alignment for Equal Character Length)e traceback and alignment stages for equal characterlength of sequences are respectively shown in this section
Figure 2 shows the pivot of each cell box calculation ofthe score matrix table of CTTGA and CTAGA in Figure 1)us Figure 2 is a simpler score matrix table constructedfrom the original score matrix table of CTTGA and CTAGAin Figure 1 )e arrow pointers direct the path from theoptimal value and traceback to the initialization value ofzero Based on the traceback and the diagonal direction ofthe arrow pointers the alignment is written as
CTTGA
CTAGA(10)
To confirm the correctness of the alignment done wecheck using calculations Recall the scoring schemematch+5 mismatch minus1 and gap minus2 We have fouralignment matches C minus C T minus T G minus G and A minus A and onemismatched alignment T minus A Hence
5 + 5 minus 1 + 5 + 5 19 (11)
which is the same as the optimal value from the score matrixtable )e alignment is thus optimal
414 Unequal Character Length (μ1 ne μ2)
Example 2 Suppose langαrang AGCTG and langβrang TCAG then tofill in the score matrix values of the cell boxes the prior-stated recursive formulation is used )is results in Figures 3and 4
Table 1 Identifying cell boxes of CTTGA and CTAGA
C T T G A0 minus2 minus4 minus6 minus8 minus10
C minus2 i ii iii iv vT minus4 vi vii viii ix xA minus6 xi xii xiii xiv xvG minus8 xvi xvii xviii xix xxA minus10 xxi xxii xxiii xxiv xxv
4 International Journal of Mathematics and Mathematical Sciences
415 Pattern Results for Unequal Character Length
(1) )e filled-in cell boxes for column 1 ie(i vi xi and xvi) cell boxes have consistent valueswith that of row 1 ie (i ii iii and iv) up untilwhere they terminate at (v) and at the inconsistent(iii)
(2) Except for where an inconsistency is noted at (v) and(iii) the pivot values for each cell box of row 1 andcolumn 1 remain the same
(3) For each 3 pointed arrows intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box
GTC T A
C
T
A
A
G
0
9
6
ndash2 ndash4
3 1
3
ndash3
ndash4
ndash6
ndash6 ndash8 ndash10
ndash15
ndash1
8 11
ndash8
ndash10
ndash2
10 4
1 7
8
1476 12
ndash3 1254 19
Figure 2 Traceback in the score matrix of CTTGA and CTAGA
Table 2 Filled-in table of CTTGA and CTAGAi DV 5 LV minus4 RV minus4 Pivot 5ii DV minus3 LV 3 RV minus6 Pivot 3iii DV minus5 LV 1 RV minus8 Pivot 1iv DV minus7 LV minus1 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV 3 Pivot 3vii DV 10 LV 1 RV 1 Pivot 10viii DV 8 LV 8 RV minus1 Pivot 8ix DV 0 LV 6 RV minus3 Pivot 6x DV minus2 LV 4 RV minus5 Pivot 4xi DV minus5 LV minus8 RV 1 Pivot 1xii DV 2 LV minus1 RV 8 Pivot 8xiii DV 9 LV 6 RV 6 Pivot 9xiv DV 7 LV 7 RV 4 Pivot 7xv DV 11 LV 5 RV 2 Pivot 11xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 0 LV minus3 RV 6 Pivot 6xviii DV 7 LV 4 RV 7 Pivot 7xix DV 14 LV 5 RV 5 Pivot 14xx DV 6 LV 12 RV 9 Pivot 12xxi DV minus9 LV minus12 RV minus3 Pivot minus3xxii DV minus2 LV minus5 RV 4 Pivot 4xxiii DV 5 LV 2 RV 5 Pivot 5xxiv DV 6 LV 3 RV 12 Pivot 12xxv DV 19 LV 10 RV 10 Pivot 19
GTC T A
C
T
A
A
G
0 ndash2 ndash4
ndash4
ndash6
ndash6 ndash8 ndash10
ndash8
ndash10
ndash25 ndash4 ndash3 ndash6 ndash5 ndash8 ndash7 ndash10 ndash12ndash9
ndash4 ndash3ndash113
3ndash3 10 1 8 ndash1 0 ndash3 ndash2 ndash54
4
6
6
8
98
1
1 2
ndash6
ndash5
ndash8 ndash1ndash1ndash7
ndash10 ndash3ndash3
66 6
7
77
7
5
5
11
912
12 1954
ndash9
ndash12 ndash5ndash2 4 5 5
2 3
6
1010
140
2
Figure 1 Score matrix of the cell boxes of CTTGA and CTAGA
T
C
A
G
A G C T G
0
ndash2
ndash2
ndash4ndash4
ndash4 ndash4ndash4
ndash4
ndash4
ndash4
ndash6
ndash6
ndash6
ndash6
ndash8
ndash8
ndash10
ndash8
ndash8
ndash10
ndash10 ndash9ndash1 ndash12ndash1 ndash3
ndash3ndash3
ndash5
ndash5ndash5
ndash5
ndash5
ndash2 ndash22
2
ndash2
ndash2
ndash2
ndash2ndash2
ndash5
ndash7
ndash7
ndash7 ndash1
ndash1 ndash1
ndash1 ndash1
ndash1
ndash6
6 6
ndash3
ndash3
ndash3
ndash3
ndash3ndash3 ndash3
ndash3
0
0
04
11
Figure 3 Score matrix of AGCTG and TCAG
T
T
C
C
A
A
G
G G
0 ndash10
ndash8
ndash8
ndash6
ndash6
ndash4
ndash4
ndash2
ndash2
i ii iii iv v
vi vii vii ix x
x xii xiii xiv xv
xvi xvii xviii xix xx
Figure 4 Labelling of the cell boxes of AGCTG and TCAG
International Journal of Mathematics and Mathematical Sciences 5
(4) For each cell box the value of the preceding bottomvalue is seen to be always less than the value of theimmediate next adjacent diagonal value
416 Traceback and Alignment for Unequal CharacterLength )e traceback and alignment stages for the exampleon unequal character length of sequences are respectivelyshown below Figure 5 shows the pivot of each cell boxcalculation of the score matrix table of AGCTG and TCAGof Table 3 Based on the traceback and the direction of thearrow pointers the alignment is written as
AGCTG
T minus CAG(12)
We check the correctness of the alignment done usingcalculations Recall the scoring scheme match+5 mis-match minus1 and gap minus2 We have two alignment matchesG minus G and C minus C two alignment mismatches A T and T Aand one gapped alignment G minus Hence
minus1 minus 2 + 5 minus 1 + 5 6 (13)
which is the same as the optimal value from the score matrixtable )e alignment is thus optimal
42 Propositions and Proofs for Equal Sequence Length)e following propositions are a priori results deduced fromthe pattern results of equal character length of sequences
Definition 3 Let the linear gap penalty be v(k) gk whereg is a constant of minus2 and k is the sequence length count ofk 1 μ1 for the score matrix row and k 1 μ2 forthe score matrix column
Definition 4 Define P(0 0) 0 to be the initialization pivotfor the score matrix
Definition 5 Define Pm(i 0) Pm(i minus 1 0) + gap penaltyforalli 1 μ1 to be the initial row pivot for the score matrix
Definition 6 Define Pm(0 j) Pm(0 j minus 1) + gap penaltyforallj 1 μ2 to be the initial column pivot for the scorematrix
Definition 7 Define
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
(14)
foralli 1 μ1 and forallj 1 μ2 where Pm(i j) is the pivotfor each cell box calculation PD(i j) Pm(i minus 1 j minus 1) +
σ(α(i) β(j)) is the diagonal value of a cell box PR(i j)
Pm(i minus 1 j)+ gap penalty is the right value of a cell boxPL(i j) Pm(i j minus 1)+ gap penalty is the left value of a cellbox and σ(α(i) β(j)) is the score for aligning the sequencecharacters of α(i) and β(j)
Definition 8 Define
σ(α(i) β(j)) +5 for α(i) β(j) charactermatch
minus1 for α(i) β(j) charactermismatch1113896
(15)foralli 1 μ1 and forallj 1 μ2
Proposition 2 Filled-in cell boxes for Pm(1 j)forallj
1 μ2 are analogous to filled-in cell boxes for Pm(i 1)foralli
1 μ1 for equal length comparison of sequences
Proof By Definition 7 we have
Pm(1 j) max
PD(1 j) Pm(0 j minus 1) + σ(α(1) β(j))
PR(1 j) Pm(0 j) + gap
PL(1 j) Pm(1 j minus 1) + gap
⎧⎪⎪⎨
⎪⎪⎩
Pm(i 1) max
PD(i 1) Pm(i minus 1 0) + σ(α(i) β(1))
PR(i 1) Pm(i minus 1 1) + gap
PL(i 1) Pm(i 0) + gap
⎧⎪⎪⎨
⎪⎪⎩
(16)
TCA G G
T
C
A
G
0
0
0
ndash2 ndash4
ndash3 ndash5
ndash3
ndash3
ndash4
ndash6
ndash6 ndash8 ndash10
ndash1ndash1
ndash1
ndash1 ndash1
ndash8
ndash2
ndash2 ndash2
1 1
2
246 6
Figure 5 Traceback in the score matrix of AGCTG and TCAG
Table 3 Filled-in table of AGCTG and TCAGi DV minus1 LV minus4 RV minus4 Pivot minus1ii DV minus3 LV minus3 RV minus6 Pivot minus3iii DV minus5 LV minus5 RV minus8 Pivot minus5iv DV minus1 LV minus7 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV minus3 Pivot minus3vii DV minus2 LV minus5 RV minus5 Pivot minus2viii DV 2 LV minus4 RV minus7 Pivot 2ix DV minus6 LV 0 RV minus3 Pivot 0x DV minus2 LV minus2 RV minus5 Pivot minus2xi DV 1 LV minus8 RV minus5 Pivot 1xii DV minus4 LV minus1 RV minus4 Pivot minus1xiii DV minus3 LV minus3 RV 0 Pivot 0xiv DV 1 LV minus2 RV minus2 Pivot minus2xv DV minus1 LV minus1 RV minus4 Pivot minus1xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 6 LV minus3 RV minus3 Pivot 6xviii DV minus2 LV 4 RV minus2 Pivot 4xix DV minus1 LV 2 RV minus1 Pivot 2xx DV 6 LV 0 RV minus3 Pivot 6
6 International Journal of Mathematics and Mathematical Sciences
foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6
Pm(i 0) Pm(i minus 1 0) + gap
⟹PL(i 1) Pm(i 0) + gap
Pm(i minus 1 0) + gap( 1113857 + gap
(17)
also
Pm(0 j) Pm(0 j minus 1) + gap
⟹PR(1 j) Pm(0 j) + gap
Pm(0 j minus 1) + gap( 1113857 + gap
(18)
Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4Pm(i 0) Pm(0 j)
(19)
Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof
Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)
Pm(i minus 1 1) + gap and by induction let i 1 j then
PL(1 j) Pm(1 0) + gap
P(0 0) + gap + gap
P(0 0) + 2 gaps
PR(i 1) Pm(0 1) + gap
P(0 0) + gap + gap
P(0 0) + 2gaps
(20)
Assume Pm(k 0) Pm(0 k) then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4PR(i 1) PL(1 j) where i j
(21)
which completes the second part of our proofRecall
PL(i 1) Pm(i minus 1 0) + 2 gaps
PR(1 j) Pm(0 j minus 1) + 2 gaps
PL(i 1) PR(1 j)was established
(22)
It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)
Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))
For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)
and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us
σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)
Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)
Pm(0 j minus 1)
Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa
Proof From Proposition 2 and Definitions 5ndash7 it is clearthat
PL(i 1) PR(1 j)
PR(i 1) PL(1 j)
where i j
(24)
Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences
Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2
Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1
(25)
and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved
Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7
International Journal of Mathematics and Mathematical Sciences 7
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j) max
PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))
PR(i + 1 j) Pm(i j) + gap penalty
PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i j + 1) max
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))
PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j + 1) max
PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))
PR(i + 1 j + 1) Pm(i j + 1) + gap penalty
PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
PR(i + 1 j) Pm(i j) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
(26)
Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write
PR(i j) Pm(i minus 1 j) + gap penalty
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)
Weare left to show that gap penalty lt σ(α(i) β(j+ 1))
Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match
For any k count forallk 1 μ the linear gap penalty gk
decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us
gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score
match diagonal score
foralli j 1 μ
(28)
Hence PR(i j)ltPD(i j + 1)foralli j 1 μ
43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences
Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1
j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof
Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof
8 International Journal of Mathematics and Mathematical Sciences
5 Conclusion
In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs
Data Availability
No data were used to support this study
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
References
[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005
[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003
[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007
[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005
[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970
[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008
[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005
[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981
[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004
[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002
[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957
[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004
[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018
[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication
company using prometheerdquo Applied Informatics vol 5 no 12018
[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019
[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010
[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003
[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012
[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017
[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009
[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986
International Journal of Mathematics and Mathematical Sciences 9
)us for both the vertical and horizontal arrow posi-tions one character and a gap are written for the alignmentwhere the gap explicitly replaces a character position in thealignment An alignment can only be inferred as the best ifthe optimal value from the score matrix table corresponds tothe alignment score calculation (based on the scoringscheme defined)
232 0e Problem of Aligning Any Two Sequences )eproblem of aligning any two sequences can be simplified asdiscovering the optimal means of aligning any two arbitrarysequences say langαrang and langβrang such that the character ldquominusrdquo notedas a gap is filled into both langαrang and langβrang or either of themwhere
(1) Any single character in langαrang matches a single char-acter in langβrang or a gap
(2) )e final sum of scores from the scoring functionover the aligned pairs and the gap penalties as givenby the function of gap penalty is maximized
233 Letter Choice for Arbitrary Sequences )e letterchoice of this study shall be that of DNA considered as astring of four characters of adeninemdashA guaninemdashGcytosinemdashC and thyminemdashT
3 Scheme for Scoring
Definition 1 )e constant gap penalty v(k) g times k whereg is an assigned constant and k is the sequence length countof i 1 μ1 for the score matrix row and j 1 μ2 forthe score matrix column)is is the penalty awarded to gapsand is also known as the linear gap function
Definition 2 )e affine gap penalty is the penalty awarded togaps where a greatest consecutive sequence of k gaps is givenas v(k) q + gk where q is the penalty charged for openingthe gap and g is the penalty charged for extending it
)us more formally we define
(1) A constant linear gap as
v(k) gk (2)
(2) An affine gap as
v(k) q + gk kge 1
0 k 01113896 (3)
Altschulrsquos theory assign positive scores to an identityand conserved replacements and assign negative scores toless likely replacements
Remark 1 Needleman and Wunsch use the ldquoidentity ma-trixrdquo in scoring with 1 for a match and 0 for a mismatchNeedlemanndashWunschrsquos score was criticised for not reflecting
observations from the nature because purine-purine orpyrimidine-pyrimidine is less prone to be mutated incomparison with mutations of purine-pyrimidine Becausethere is no definite defined score for DNA alignment thechoice of scoring for this study is +5 for a match minus1 for amismatch and minus2 for a gap as proposed in [18] following theabove theory
More formally let
σ(α(i) β(j)) +5 for α(i) β(j) charactermatch
minus1 for α(i) β(j) charactermismatch1113896
(4)
foralli 1 μ1 andforallj 1 μ2Also let the linear gap penalty v(k) gk where g is a
constant of ldquominus2rdquo and k is the sequence length count of i
1 μ1 for the score matrix row and j 1 μ2 for thescore matrix column
4 Results and Discussion
41 Pattern Investigations in the Score Matrix of Needle-manndashWunschAlgorithm )e sequences α and β that will beused are arbitrary sample sequences that were chosen basedon consideration of the following
(1) μ1 μ2 (equal character length of sequences)(2) μ1 ne μ2 (unequal character length of sequences)
411 Equal Character Length (μ1 μ2 μ)
Example 1 Let langαrang CTTGA and langβrang CTAGA Becausethe length of the sequence characters of langαrang and langβrang isμ1 μ2 5 respectively we align α and β in a score matrixfollowing the above formulation
(1) Initialization P(0 0) 0 for score matrix initiali-zation and then for the initial row pivot Pm(i 0)
Pm(i minus 1 0) + gap penalty where forallk i we find thegap penalty as follows
for i 1 Pm(1 0) P(0 0) + gap penalty
⟹ 0 +(minus2) minus2
for i 2 Pm(2 0) Pm(1 0) + gap penalty
⟹ minus 2 +(minus2) minus4
for i 3 Pm(3 0) Pm(2 0) + gap penalty
⟹ minus 4 +(minus2) minus6
for i 4 Pm(4 0) Pm(3 0) + gap penalty
⟹ minus 6 +(minus2) minus8
for i 5 Pm(5 0) Pm(4 0) + gap penalty
⟹ minus 8 +(minus2) minus10
(5)
and for the initial column pivot Pm(0 j) Pm(0 j minus 1) +gap penalty forallk j the gap penalty is obtained as follows
International Journal of Mathematics and Mathematical Sciences 3
for j 1 Pm(0 1) P(0 0) + gap penalty
⟹ 0 +(minus2) minus2
for j 2 Pm(0 2) Pm(0 1) + gap penalty
⟹ minus 2 +(minus2) minus4
for j 3 Pm(0 3) Pm(0 2) + gap penalty
⟹ minus 4 +(minus2) minus6
for j 4 Pm(0 4) Pm(0 3) + gap penalty
⟹ minus 6 +(minus2) minus8
for j 5 Pm(0 5) Pm(0 4) + gap penalty
⟹ minus 8 +(minus2) minus10
(6)
Before following with further calculation some iden-tification is placed on each cell box for easy noting asshown in Table 1
(2) Cell box calculations
Pm(1 1) max
PD(1 1) P(0 0) + σ(C C) 0 + 5 5
PR(1 1) Pm(0 1) + gap minus2 +(minus2) minus4
PL(1 1) Pm(1 0) + gap minus2 +(minus2) minus4
⎧⎪⎪⎨
⎪⎪⎩
(7)
Remark 2 σ(C C) in PD(1 1) was ldquo5rdquo because that is theassigned score for matched characters Pm(1 1) maxPD(1 1) PR(1 1) PL(1 1)1113864 1113865 max 5 minus4 minus41113858 1113859 5
Pm(1 2) max
PD(1 2) Pm(0 1) + σ(C T) minus2 +(minus1) minus3
PR(1 2) Pm(0 2) + gap minus4 +(minus2) minus6
PL(1 2) Pm(1 1) + gap 5 +(minus2) 3
⎧⎪⎪⎨
⎪⎪⎩
(8)
Remark 3 σ(C T) in PD(1 2) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(12)
max PD1113864 (12)PR(12)PL(12) max minus3 minus6 31113858 1113859 3
Pm(1 3) max
PD(1 3) Pm(0 2) + σ(C A) minus4 +(minus1) minus5
PR(1 3) Pm(0 3) + gap minus6 +(minus2) minus8
PL(1 3) Pm(1 2) + gap 3 +(minus2) 1
⎧⎪⎪⎨
⎪⎪⎩
(9)
Remark 4 σ(C A) in PD(1 3) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(1 3)
max PD1113864 (1 3) PR(1 3) PL(1 3) max minus5 minus8 11113858 1113859 1Consequently the recursion follows similarly until all
the cell boxes are filled In the event of a pivot tie in a cell boxone tie value is picked For the tabular display these no-tations are used interchangeably PD(i j) is the same as DVPL(i j) is the same as LV PR(i j) is the same as RV andPm(i j) is the same as the pivot
412 Pattern Results for Equal Character Length Withreference to Tables 1 and 2 and Figure 1
(1) )e filled-in cell boxes for column 1 ie cell boxes(i vi xi xvi and xxi) have values coinciding withthat of row 1 ie cell boxes (i ii iii iv and v)
(2) )e leading value (pivot) of each cell box in both row1 and column 1 remains the same
(3) )e bottom value in column 1 switches to becomethe beside value in row 1 and vice versa
(4) For each 3 pointed arrow intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box
(5) For each cell box the value of the preceding bottomvalue is less than the value of the immediate nextadjacent diagonal value
413 Traceback and Alignment for Equal Character Length)e traceback and alignment stages for equal characterlength of sequences are respectively shown in this section
Figure 2 shows the pivot of each cell box calculation ofthe score matrix table of CTTGA and CTAGA in Figure 1)us Figure 2 is a simpler score matrix table constructedfrom the original score matrix table of CTTGA and CTAGAin Figure 1 )e arrow pointers direct the path from theoptimal value and traceback to the initialization value ofzero Based on the traceback and the diagonal direction ofthe arrow pointers the alignment is written as
CTTGA
CTAGA(10)
To confirm the correctness of the alignment done wecheck using calculations Recall the scoring schemematch+5 mismatch minus1 and gap minus2 We have fouralignment matches C minus C T minus T G minus G and A minus A and onemismatched alignment T minus A Hence
5 + 5 minus 1 + 5 + 5 19 (11)
which is the same as the optimal value from the score matrixtable )e alignment is thus optimal
414 Unequal Character Length (μ1 ne μ2)
Example 2 Suppose langαrang AGCTG and langβrang TCAG then tofill in the score matrix values of the cell boxes the prior-stated recursive formulation is used )is results in Figures 3and 4
Table 1 Identifying cell boxes of CTTGA and CTAGA
C T T G A0 minus2 minus4 minus6 minus8 minus10
C minus2 i ii iii iv vT minus4 vi vii viii ix xA minus6 xi xii xiii xiv xvG minus8 xvi xvii xviii xix xxA minus10 xxi xxii xxiii xxiv xxv
4 International Journal of Mathematics and Mathematical Sciences
415 Pattern Results for Unequal Character Length
(1) )e filled-in cell boxes for column 1 ie(i vi xi and xvi) cell boxes have consistent valueswith that of row 1 ie (i ii iii and iv) up untilwhere they terminate at (v) and at the inconsistent(iii)
(2) Except for where an inconsistency is noted at (v) and(iii) the pivot values for each cell box of row 1 andcolumn 1 remain the same
(3) For each 3 pointed arrows intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box
GTC T A
C
T
A
A
G
0
9
6
ndash2 ndash4
3 1
3
ndash3
ndash4
ndash6
ndash6 ndash8 ndash10
ndash15
ndash1
8 11
ndash8
ndash10
ndash2
10 4
1 7
8
1476 12
ndash3 1254 19
Figure 2 Traceback in the score matrix of CTTGA and CTAGA
Table 2 Filled-in table of CTTGA and CTAGAi DV 5 LV minus4 RV minus4 Pivot 5ii DV minus3 LV 3 RV minus6 Pivot 3iii DV minus5 LV 1 RV minus8 Pivot 1iv DV minus7 LV minus1 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV 3 Pivot 3vii DV 10 LV 1 RV 1 Pivot 10viii DV 8 LV 8 RV minus1 Pivot 8ix DV 0 LV 6 RV minus3 Pivot 6x DV minus2 LV 4 RV minus5 Pivot 4xi DV minus5 LV minus8 RV 1 Pivot 1xii DV 2 LV minus1 RV 8 Pivot 8xiii DV 9 LV 6 RV 6 Pivot 9xiv DV 7 LV 7 RV 4 Pivot 7xv DV 11 LV 5 RV 2 Pivot 11xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 0 LV minus3 RV 6 Pivot 6xviii DV 7 LV 4 RV 7 Pivot 7xix DV 14 LV 5 RV 5 Pivot 14xx DV 6 LV 12 RV 9 Pivot 12xxi DV minus9 LV minus12 RV minus3 Pivot minus3xxii DV minus2 LV minus5 RV 4 Pivot 4xxiii DV 5 LV 2 RV 5 Pivot 5xxiv DV 6 LV 3 RV 12 Pivot 12xxv DV 19 LV 10 RV 10 Pivot 19
GTC T A
C
T
A
A
G
0 ndash2 ndash4
ndash4
ndash6
ndash6 ndash8 ndash10
ndash8
ndash10
ndash25 ndash4 ndash3 ndash6 ndash5 ndash8 ndash7 ndash10 ndash12ndash9
ndash4 ndash3ndash113
3ndash3 10 1 8 ndash1 0 ndash3 ndash2 ndash54
4
6
6
8
98
1
1 2
ndash6
ndash5
ndash8 ndash1ndash1ndash7
ndash10 ndash3ndash3
66 6
7
77
7
5
5
11
912
12 1954
ndash9
ndash12 ndash5ndash2 4 5 5
2 3
6
1010
140
2
Figure 1 Score matrix of the cell boxes of CTTGA and CTAGA
T
C
A
G
A G C T G
0
ndash2
ndash2
ndash4ndash4
ndash4 ndash4ndash4
ndash4
ndash4
ndash4
ndash6
ndash6
ndash6
ndash6
ndash8
ndash8
ndash10
ndash8
ndash8
ndash10
ndash10 ndash9ndash1 ndash12ndash1 ndash3
ndash3ndash3
ndash5
ndash5ndash5
ndash5
ndash5
ndash2 ndash22
2
ndash2
ndash2
ndash2
ndash2ndash2
ndash5
ndash7
ndash7
ndash7 ndash1
ndash1 ndash1
ndash1 ndash1
ndash1
ndash6
6 6
ndash3
ndash3
ndash3
ndash3
ndash3ndash3 ndash3
ndash3
0
0
04
11
Figure 3 Score matrix of AGCTG and TCAG
T
T
C
C
A
A
G
G G
0 ndash10
ndash8
ndash8
ndash6
ndash6
ndash4
ndash4
ndash2
ndash2
i ii iii iv v
vi vii vii ix x
x xii xiii xiv xv
xvi xvii xviii xix xx
Figure 4 Labelling of the cell boxes of AGCTG and TCAG
International Journal of Mathematics and Mathematical Sciences 5
(4) For each cell box the value of the preceding bottomvalue is seen to be always less than the value of theimmediate next adjacent diagonal value
416 Traceback and Alignment for Unequal CharacterLength )e traceback and alignment stages for the exampleon unequal character length of sequences are respectivelyshown below Figure 5 shows the pivot of each cell boxcalculation of the score matrix table of AGCTG and TCAGof Table 3 Based on the traceback and the direction of thearrow pointers the alignment is written as
AGCTG
T minus CAG(12)
We check the correctness of the alignment done usingcalculations Recall the scoring scheme match+5 mis-match minus1 and gap minus2 We have two alignment matchesG minus G and C minus C two alignment mismatches A T and T Aand one gapped alignment G minus Hence
minus1 minus 2 + 5 minus 1 + 5 6 (13)
which is the same as the optimal value from the score matrixtable )e alignment is thus optimal
42 Propositions and Proofs for Equal Sequence Length)e following propositions are a priori results deduced fromthe pattern results of equal character length of sequences
Definition 3 Let the linear gap penalty be v(k) gk whereg is a constant of minus2 and k is the sequence length count ofk 1 μ1 for the score matrix row and k 1 μ2 forthe score matrix column
Definition 4 Define P(0 0) 0 to be the initialization pivotfor the score matrix
Definition 5 Define Pm(i 0) Pm(i minus 1 0) + gap penaltyforalli 1 μ1 to be the initial row pivot for the score matrix
Definition 6 Define Pm(0 j) Pm(0 j minus 1) + gap penaltyforallj 1 μ2 to be the initial column pivot for the scorematrix
Definition 7 Define
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
(14)
foralli 1 μ1 and forallj 1 μ2 where Pm(i j) is the pivotfor each cell box calculation PD(i j) Pm(i minus 1 j minus 1) +
σ(α(i) β(j)) is the diagonal value of a cell box PR(i j)
Pm(i minus 1 j)+ gap penalty is the right value of a cell boxPL(i j) Pm(i j minus 1)+ gap penalty is the left value of a cellbox and σ(α(i) β(j)) is the score for aligning the sequencecharacters of α(i) and β(j)
Definition 8 Define
σ(α(i) β(j)) +5 for α(i) β(j) charactermatch
minus1 for α(i) β(j) charactermismatch1113896
(15)foralli 1 μ1 and forallj 1 μ2
Proposition 2 Filled-in cell boxes for Pm(1 j)forallj
1 μ2 are analogous to filled-in cell boxes for Pm(i 1)foralli
1 μ1 for equal length comparison of sequences
Proof By Definition 7 we have
Pm(1 j) max
PD(1 j) Pm(0 j minus 1) + σ(α(1) β(j))
PR(1 j) Pm(0 j) + gap
PL(1 j) Pm(1 j minus 1) + gap
⎧⎪⎪⎨
⎪⎪⎩
Pm(i 1) max
PD(i 1) Pm(i minus 1 0) + σ(α(i) β(1))
PR(i 1) Pm(i minus 1 1) + gap
PL(i 1) Pm(i 0) + gap
⎧⎪⎪⎨
⎪⎪⎩
(16)
TCA G G
T
C
A
G
0
0
0
ndash2 ndash4
ndash3 ndash5
ndash3
ndash3
ndash4
ndash6
ndash6 ndash8 ndash10
ndash1ndash1
ndash1
ndash1 ndash1
ndash8
ndash2
ndash2 ndash2
1 1
2
246 6
Figure 5 Traceback in the score matrix of AGCTG and TCAG
Table 3 Filled-in table of AGCTG and TCAGi DV minus1 LV minus4 RV minus4 Pivot minus1ii DV minus3 LV minus3 RV minus6 Pivot minus3iii DV minus5 LV minus5 RV minus8 Pivot minus5iv DV minus1 LV minus7 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV minus3 Pivot minus3vii DV minus2 LV minus5 RV minus5 Pivot minus2viii DV 2 LV minus4 RV minus7 Pivot 2ix DV minus6 LV 0 RV minus3 Pivot 0x DV minus2 LV minus2 RV minus5 Pivot minus2xi DV 1 LV minus8 RV minus5 Pivot 1xii DV minus4 LV minus1 RV minus4 Pivot minus1xiii DV minus3 LV minus3 RV 0 Pivot 0xiv DV 1 LV minus2 RV minus2 Pivot minus2xv DV minus1 LV minus1 RV minus4 Pivot minus1xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 6 LV minus3 RV minus3 Pivot 6xviii DV minus2 LV 4 RV minus2 Pivot 4xix DV minus1 LV 2 RV minus1 Pivot 2xx DV 6 LV 0 RV minus3 Pivot 6
6 International Journal of Mathematics and Mathematical Sciences
foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6
Pm(i 0) Pm(i minus 1 0) + gap
⟹PL(i 1) Pm(i 0) + gap
Pm(i minus 1 0) + gap( 1113857 + gap
(17)
also
Pm(0 j) Pm(0 j minus 1) + gap
⟹PR(1 j) Pm(0 j) + gap
Pm(0 j minus 1) + gap( 1113857 + gap
(18)
Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4Pm(i 0) Pm(0 j)
(19)
Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof
Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)
Pm(i minus 1 1) + gap and by induction let i 1 j then
PL(1 j) Pm(1 0) + gap
P(0 0) + gap + gap
P(0 0) + 2 gaps
PR(i 1) Pm(0 1) + gap
P(0 0) + gap + gap
P(0 0) + 2gaps
(20)
Assume Pm(k 0) Pm(0 k) then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4PR(i 1) PL(1 j) where i j
(21)
which completes the second part of our proofRecall
PL(i 1) Pm(i minus 1 0) + 2 gaps
PR(1 j) Pm(0 j minus 1) + 2 gaps
PL(i 1) PR(1 j)was established
(22)
It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)
Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))
For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)
and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us
σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)
Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)
Pm(0 j minus 1)
Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa
Proof From Proposition 2 and Definitions 5ndash7 it is clearthat
PL(i 1) PR(1 j)
PR(i 1) PL(1 j)
where i j
(24)
Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences
Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2
Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1
(25)
and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved
Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7
International Journal of Mathematics and Mathematical Sciences 7
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j) max
PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))
PR(i + 1 j) Pm(i j) + gap penalty
PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i j + 1) max
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))
PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j + 1) max
PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))
PR(i + 1 j + 1) Pm(i j + 1) + gap penalty
PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
PR(i + 1 j) Pm(i j) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
(26)
Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write
PR(i j) Pm(i minus 1 j) + gap penalty
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)
Weare left to show that gap penalty lt σ(α(i) β(j+ 1))
Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match
For any k count forallk 1 μ the linear gap penalty gk
decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us
gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score
match diagonal score
foralli j 1 μ
(28)
Hence PR(i j)ltPD(i j + 1)foralli j 1 μ
43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences
Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1
j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof
Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof
8 International Journal of Mathematics and Mathematical Sciences
5 Conclusion
In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs
Data Availability
No data were used to support this study
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
References
[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005
[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003
[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007
[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005
[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970
[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008
[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005
[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981
[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004
[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002
[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957
[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004
[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018
[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication
company using prometheerdquo Applied Informatics vol 5 no 12018
[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019
[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010
[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003
[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012
[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017
[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009
[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986
International Journal of Mathematics and Mathematical Sciences 9
for j 1 Pm(0 1) P(0 0) + gap penalty
⟹ 0 +(minus2) minus2
for j 2 Pm(0 2) Pm(0 1) + gap penalty
⟹ minus 2 +(minus2) minus4
for j 3 Pm(0 3) Pm(0 2) + gap penalty
⟹ minus 4 +(minus2) minus6
for j 4 Pm(0 4) Pm(0 3) + gap penalty
⟹ minus 6 +(minus2) minus8
for j 5 Pm(0 5) Pm(0 4) + gap penalty
⟹ minus 8 +(minus2) minus10
(6)
Before following with further calculation some iden-tification is placed on each cell box for easy noting asshown in Table 1
(2) Cell box calculations
Pm(1 1) max
PD(1 1) P(0 0) + σ(C C) 0 + 5 5
PR(1 1) Pm(0 1) + gap minus2 +(minus2) minus4
PL(1 1) Pm(1 0) + gap minus2 +(minus2) minus4
⎧⎪⎪⎨
⎪⎪⎩
(7)
Remark 2 σ(C C) in PD(1 1) was ldquo5rdquo because that is theassigned score for matched characters Pm(1 1) maxPD(1 1) PR(1 1) PL(1 1)1113864 1113865 max 5 minus4 minus41113858 1113859 5
Pm(1 2) max
PD(1 2) Pm(0 1) + σ(C T) minus2 +(minus1) minus3
PR(1 2) Pm(0 2) + gap minus4 +(minus2) minus6
PL(1 2) Pm(1 1) + gap 5 +(minus2) 3
⎧⎪⎪⎨
⎪⎪⎩
(8)
Remark 3 σ(C T) in PD(1 2) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(12)
max PD1113864 (12)PR(12)PL(12) max minus3 minus6 31113858 1113859 3
Pm(1 3) max
PD(1 3) Pm(0 2) + σ(C A) minus4 +(minus1) minus5
PR(1 3) Pm(0 3) + gap minus6 +(minus2) minus8
PL(1 3) Pm(1 2) + gap 3 +(minus2) 1
⎧⎪⎪⎨
⎪⎪⎩
(9)
Remark 4 σ(C A) in PD(1 3) was ldquominus1rdquo because that is theassigned score for mismatched characters Pm(1 3)
max PD1113864 (1 3) PR(1 3) PL(1 3) max minus5 minus8 11113858 1113859 1Consequently the recursion follows similarly until all
the cell boxes are filled In the event of a pivot tie in a cell boxone tie value is picked For the tabular display these no-tations are used interchangeably PD(i j) is the same as DVPL(i j) is the same as LV PR(i j) is the same as RV andPm(i j) is the same as the pivot
412 Pattern Results for Equal Character Length Withreference to Tables 1 and 2 and Figure 1
(1) )e filled-in cell boxes for column 1 ie cell boxes(i vi xi xvi and xxi) have values coinciding withthat of row 1 ie cell boxes (i ii iii iv and v)
(2) )e leading value (pivot) of each cell box in both row1 and column 1 remains the same
(3) )e bottom value in column 1 switches to becomethe beside value in row 1 and vice versa
(4) For each 3 pointed arrow intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box
(5) For each cell box the value of the preceding bottomvalue is less than the value of the immediate nextadjacent diagonal value
413 Traceback and Alignment for Equal Character Length)e traceback and alignment stages for equal characterlength of sequences are respectively shown in this section
Figure 2 shows the pivot of each cell box calculation ofthe score matrix table of CTTGA and CTAGA in Figure 1)us Figure 2 is a simpler score matrix table constructedfrom the original score matrix table of CTTGA and CTAGAin Figure 1 )e arrow pointers direct the path from theoptimal value and traceback to the initialization value ofzero Based on the traceback and the diagonal direction ofthe arrow pointers the alignment is written as
CTTGA
CTAGA(10)
To confirm the correctness of the alignment done wecheck using calculations Recall the scoring schemematch+5 mismatch minus1 and gap minus2 We have fouralignment matches C minus C T minus T G minus G and A minus A and onemismatched alignment T minus A Hence
5 + 5 minus 1 + 5 + 5 19 (11)
which is the same as the optimal value from the score matrixtable )e alignment is thus optimal
414 Unequal Character Length (μ1 ne μ2)
Example 2 Suppose langαrang AGCTG and langβrang TCAG then tofill in the score matrix values of the cell boxes the prior-stated recursive formulation is used )is results in Figures 3and 4
Table 1 Identifying cell boxes of CTTGA and CTAGA
C T T G A0 minus2 minus4 minus6 minus8 minus10
C minus2 i ii iii iv vT minus4 vi vii viii ix xA minus6 xi xii xiii xiv xvG minus8 xvi xvii xviii xix xxA minus10 xxi xxii xxiii xxiv xxv
4 International Journal of Mathematics and Mathematical Sciences
415 Pattern Results for Unequal Character Length
(1) )e filled-in cell boxes for column 1 ie(i vi xi and xvi) cell boxes have consistent valueswith that of row 1 ie (i ii iii and iv) up untilwhere they terminate at (v) and at the inconsistent(iii)
(2) Except for where an inconsistency is noted at (v) and(iii) the pivot values for each cell box of row 1 andcolumn 1 remain the same
(3) For each 3 pointed arrows intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box
GTC T A
C
T
A
A
G
0
9
6
ndash2 ndash4
3 1
3
ndash3
ndash4
ndash6
ndash6 ndash8 ndash10
ndash15
ndash1
8 11
ndash8
ndash10
ndash2
10 4
1 7
8
1476 12
ndash3 1254 19
Figure 2 Traceback in the score matrix of CTTGA and CTAGA
Table 2 Filled-in table of CTTGA and CTAGAi DV 5 LV minus4 RV minus4 Pivot 5ii DV minus3 LV 3 RV minus6 Pivot 3iii DV minus5 LV 1 RV minus8 Pivot 1iv DV minus7 LV minus1 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV 3 Pivot 3vii DV 10 LV 1 RV 1 Pivot 10viii DV 8 LV 8 RV minus1 Pivot 8ix DV 0 LV 6 RV minus3 Pivot 6x DV minus2 LV 4 RV minus5 Pivot 4xi DV minus5 LV minus8 RV 1 Pivot 1xii DV 2 LV minus1 RV 8 Pivot 8xiii DV 9 LV 6 RV 6 Pivot 9xiv DV 7 LV 7 RV 4 Pivot 7xv DV 11 LV 5 RV 2 Pivot 11xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 0 LV minus3 RV 6 Pivot 6xviii DV 7 LV 4 RV 7 Pivot 7xix DV 14 LV 5 RV 5 Pivot 14xx DV 6 LV 12 RV 9 Pivot 12xxi DV minus9 LV minus12 RV minus3 Pivot minus3xxii DV minus2 LV minus5 RV 4 Pivot 4xxiii DV 5 LV 2 RV 5 Pivot 5xxiv DV 6 LV 3 RV 12 Pivot 12xxv DV 19 LV 10 RV 10 Pivot 19
GTC T A
C
T
A
A
G
0 ndash2 ndash4
ndash4
ndash6
ndash6 ndash8 ndash10
ndash8
ndash10
ndash25 ndash4 ndash3 ndash6 ndash5 ndash8 ndash7 ndash10 ndash12ndash9
ndash4 ndash3ndash113
3ndash3 10 1 8 ndash1 0 ndash3 ndash2 ndash54
4
6
6
8
98
1
1 2
ndash6
ndash5
ndash8 ndash1ndash1ndash7
ndash10 ndash3ndash3
66 6
7
77
7
5
5
11
912
12 1954
ndash9
ndash12 ndash5ndash2 4 5 5
2 3
6
1010
140
2
Figure 1 Score matrix of the cell boxes of CTTGA and CTAGA
T
C
A
G
A G C T G
0
ndash2
ndash2
ndash4ndash4
ndash4 ndash4ndash4
ndash4
ndash4
ndash4
ndash6
ndash6
ndash6
ndash6
ndash8
ndash8
ndash10
ndash8
ndash8
ndash10
ndash10 ndash9ndash1 ndash12ndash1 ndash3
ndash3ndash3
ndash5
ndash5ndash5
ndash5
ndash5
ndash2 ndash22
2
ndash2
ndash2
ndash2
ndash2ndash2
ndash5
ndash7
ndash7
ndash7 ndash1
ndash1 ndash1
ndash1 ndash1
ndash1
ndash6
6 6
ndash3
ndash3
ndash3
ndash3
ndash3ndash3 ndash3
ndash3
0
0
04
11
Figure 3 Score matrix of AGCTG and TCAG
T
T
C
C
A
A
G
G G
0 ndash10
ndash8
ndash8
ndash6
ndash6
ndash4
ndash4
ndash2
ndash2
i ii iii iv v
vi vii vii ix x
x xii xiii xiv xv
xvi xvii xviii xix xx
Figure 4 Labelling of the cell boxes of AGCTG and TCAG
International Journal of Mathematics and Mathematical Sciences 5
(4) For each cell box the value of the preceding bottomvalue is seen to be always less than the value of theimmediate next adjacent diagonal value
416 Traceback and Alignment for Unequal CharacterLength )e traceback and alignment stages for the exampleon unequal character length of sequences are respectivelyshown below Figure 5 shows the pivot of each cell boxcalculation of the score matrix table of AGCTG and TCAGof Table 3 Based on the traceback and the direction of thearrow pointers the alignment is written as
AGCTG
T minus CAG(12)
We check the correctness of the alignment done usingcalculations Recall the scoring scheme match+5 mis-match minus1 and gap minus2 We have two alignment matchesG minus G and C minus C two alignment mismatches A T and T Aand one gapped alignment G minus Hence
minus1 minus 2 + 5 minus 1 + 5 6 (13)
which is the same as the optimal value from the score matrixtable )e alignment is thus optimal
42 Propositions and Proofs for Equal Sequence Length)e following propositions are a priori results deduced fromthe pattern results of equal character length of sequences
Definition 3 Let the linear gap penalty be v(k) gk whereg is a constant of minus2 and k is the sequence length count ofk 1 μ1 for the score matrix row and k 1 μ2 forthe score matrix column
Definition 4 Define P(0 0) 0 to be the initialization pivotfor the score matrix
Definition 5 Define Pm(i 0) Pm(i minus 1 0) + gap penaltyforalli 1 μ1 to be the initial row pivot for the score matrix
Definition 6 Define Pm(0 j) Pm(0 j minus 1) + gap penaltyforallj 1 μ2 to be the initial column pivot for the scorematrix
Definition 7 Define
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
(14)
foralli 1 μ1 and forallj 1 μ2 where Pm(i j) is the pivotfor each cell box calculation PD(i j) Pm(i minus 1 j minus 1) +
σ(α(i) β(j)) is the diagonal value of a cell box PR(i j)
Pm(i minus 1 j)+ gap penalty is the right value of a cell boxPL(i j) Pm(i j minus 1)+ gap penalty is the left value of a cellbox and σ(α(i) β(j)) is the score for aligning the sequencecharacters of α(i) and β(j)
Definition 8 Define
σ(α(i) β(j)) +5 for α(i) β(j) charactermatch
minus1 for α(i) β(j) charactermismatch1113896
(15)foralli 1 μ1 and forallj 1 μ2
Proposition 2 Filled-in cell boxes for Pm(1 j)forallj
1 μ2 are analogous to filled-in cell boxes for Pm(i 1)foralli
1 μ1 for equal length comparison of sequences
Proof By Definition 7 we have
Pm(1 j) max
PD(1 j) Pm(0 j minus 1) + σ(α(1) β(j))
PR(1 j) Pm(0 j) + gap
PL(1 j) Pm(1 j minus 1) + gap
⎧⎪⎪⎨
⎪⎪⎩
Pm(i 1) max
PD(i 1) Pm(i minus 1 0) + σ(α(i) β(1))
PR(i 1) Pm(i minus 1 1) + gap
PL(i 1) Pm(i 0) + gap
⎧⎪⎪⎨
⎪⎪⎩
(16)
TCA G G
T
C
A
G
0
0
0
ndash2 ndash4
ndash3 ndash5
ndash3
ndash3
ndash4
ndash6
ndash6 ndash8 ndash10
ndash1ndash1
ndash1
ndash1 ndash1
ndash8
ndash2
ndash2 ndash2
1 1
2
246 6
Figure 5 Traceback in the score matrix of AGCTG and TCAG
Table 3 Filled-in table of AGCTG and TCAGi DV minus1 LV minus4 RV minus4 Pivot minus1ii DV minus3 LV minus3 RV minus6 Pivot minus3iii DV minus5 LV minus5 RV minus8 Pivot minus5iv DV minus1 LV minus7 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV minus3 Pivot minus3vii DV minus2 LV minus5 RV minus5 Pivot minus2viii DV 2 LV minus4 RV minus7 Pivot 2ix DV minus6 LV 0 RV minus3 Pivot 0x DV minus2 LV minus2 RV minus5 Pivot minus2xi DV 1 LV minus8 RV minus5 Pivot 1xii DV minus4 LV minus1 RV minus4 Pivot minus1xiii DV minus3 LV minus3 RV 0 Pivot 0xiv DV 1 LV minus2 RV minus2 Pivot minus2xv DV minus1 LV minus1 RV minus4 Pivot minus1xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 6 LV minus3 RV minus3 Pivot 6xviii DV minus2 LV 4 RV minus2 Pivot 4xix DV minus1 LV 2 RV minus1 Pivot 2xx DV 6 LV 0 RV minus3 Pivot 6
6 International Journal of Mathematics and Mathematical Sciences
foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6
Pm(i 0) Pm(i minus 1 0) + gap
⟹PL(i 1) Pm(i 0) + gap
Pm(i minus 1 0) + gap( 1113857 + gap
(17)
also
Pm(0 j) Pm(0 j minus 1) + gap
⟹PR(1 j) Pm(0 j) + gap
Pm(0 j minus 1) + gap( 1113857 + gap
(18)
Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4Pm(i 0) Pm(0 j)
(19)
Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof
Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)
Pm(i minus 1 1) + gap and by induction let i 1 j then
PL(1 j) Pm(1 0) + gap
P(0 0) + gap + gap
P(0 0) + 2 gaps
PR(i 1) Pm(0 1) + gap
P(0 0) + gap + gap
P(0 0) + 2gaps
(20)
Assume Pm(k 0) Pm(0 k) then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4PR(i 1) PL(1 j) where i j
(21)
which completes the second part of our proofRecall
PL(i 1) Pm(i minus 1 0) + 2 gaps
PR(1 j) Pm(0 j minus 1) + 2 gaps
PL(i 1) PR(1 j)was established
(22)
It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)
Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))
For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)
and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us
σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)
Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)
Pm(0 j minus 1)
Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa
Proof From Proposition 2 and Definitions 5ndash7 it is clearthat
PL(i 1) PR(1 j)
PR(i 1) PL(1 j)
where i j
(24)
Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences
Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2
Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1
(25)
and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved
Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7
International Journal of Mathematics and Mathematical Sciences 7
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j) max
PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))
PR(i + 1 j) Pm(i j) + gap penalty
PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i j + 1) max
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))
PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j + 1) max
PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))
PR(i + 1 j + 1) Pm(i j + 1) + gap penalty
PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
PR(i + 1 j) Pm(i j) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
(26)
Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write
PR(i j) Pm(i minus 1 j) + gap penalty
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)
Weare left to show that gap penalty lt σ(α(i) β(j+ 1))
Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match
For any k count forallk 1 μ the linear gap penalty gk
decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us
gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score
match diagonal score
foralli j 1 μ
(28)
Hence PR(i j)ltPD(i j + 1)foralli j 1 μ
43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences
Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1
j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof
Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof
8 International Journal of Mathematics and Mathematical Sciences
5 Conclusion
In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs
Data Availability
No data were used to support this study
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
References
[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005
[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003
[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007
[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005
[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970
[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008
[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005
[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981
[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004
[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002
[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957
[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004
[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018
[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication
company using prometheerdquo Applied Informatics vol 5 no 12018
[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019
[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010
[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003
[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012
[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017
[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009
[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986
International Journal of Mathematics and Mathematical Sciences 9
415 Pattern Results for Unequal Character Length
(1) )e filled-in cell boxes for column 1 ie(i vi xi and xvi) cell boxes have consistent valueswith that of row 1 ie (i ii iii and iv) up untilwhere they terminate at (v) and at the inconsistent(iii)
(2) Except for where an inconsistency is noted at (v) and(iii) the pivot values for each cell box of row 1 andcolumn 1 remain the same
(3) For each 3 pointed arrows intersecting any 4 cellboxes the beside value of a 2nd cell box also coin-cides with the bottom value of a 3rd cell box
GTC T A
C
T
A
A
G
0
9
6
ndash2 ndash4
3 1
3
ndash3
ndash4
ndash6
ndash6 ndash8 ndash10
ndash15
ndash1
8 11
ndash8
ndash10
ndash2
10 4
1 7
8
1476 12
ndash3 1254 19
Figure 2 Traceback in the score matrix of CTTGA and CTAGA
Table 2 Filled-in table of CTTGA and CTAGAi DV 5 LV minus4 RV minus4 Pivot 5ii DV minus3 LV 3 RV minus6 Pivot 3iii DV minus5 LV 1 RV minus8 Pivot 1iv DV minus7 LV minus1 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV 3 Pivot 3vii DV 10 LV 1 RV 1 Pivot 10viii DV 8 LV 8 RV minus1 Pivot 8ix DV 0 LV 6 RV minus3 Pivot 6x DV minus2 LV 4 RV minus5 Pivot 4xi DV minus5 LV minus8 RV 1 Pivot 1xii DV 2 LV minus1 RV 8 Pivot 8xiii DV 9 LV 6 RV 6 Pivot 9xiv DV 7 LV 7 RV 4 Pivot 7xv DV 11 LV 5 RV 2 Pivot 11xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 0 LV minus3 RV 6 Pivot 6xviii DV 7 LV 4 RV 7 Pivot 7xix DV 14 LV 5 RV 5 Pivot 14xx DV 6 LV 12 RV 9 Pivot 12xxi DV minus9 LV minus12 RV minus3 Pivot minus3xxii DV minus2 LV minus5 RV 4 Pivot 4xxiii DV 5 LV 2 RV 5 Pivot 5xxiv DV 6 LV 3 RV 12 Pivot 12xxv DV 19 LV 10 RV 10 Pivot 19
GTC T A
C
T
A
A
G
0 ndash2 ndash4
ndash4
ndash6
ndash6 ndash8 ndash10
ndash8
ndash10
ndash25 ndash4 ndash3 ndash6 ndash5 ndash8 ndash7 ndash10 ndash12ndash9
ndash4 ndash3ndash113
3ndash3 10 1 8 ndash1 0 ndash3 ndash2 ndash54
4
6
6
8
98
1
1 2
ndash6
ndash5
ndash8 ndash1ndash1ndash7
ndash10 ndash3ndash3
66 6
7
77
7
5
5
11
912
12 1954
ndash9
ndash12 ndash5ndash2 4 5 5
2 3
6
1010
140
2
Figure 1 Score matrix of the cell boxes of CTTGA and CTAGA
T
C
A
G
A G C T G
0
ndash2
ndash2
ndash4ndash4
ndash4 ndash4ndash4
ndash4
ndash4
ndash4
ndash6
ndash6
ndash6
ndash6
ndash8
ndash8
ndash10
ndash8
ndash8
ndash10
ndash10 ndash9ndash1 ndash12ndash1 ndash3
ndash3ndash3
ndash5
ndash5ndash5
ndash5
ndash5
ndash2 ndash22
2
ndash2
ndash2
ndash2
ndash2ndash2
ndash5
ndash7
ndash7
ndash7 ndash1
ndash1 ndash1
ndash1 ndash1
ndash1
ndash6
6 6
ndash3
ndash3
ndash3
ndash3
ndash3ndash3 ndash3
ndash3
0
0
04
11
Figure 3 Score matrix of AGCTG and TCAG
T
T
C
C
A
A
G
G G
0 ndash10
ndash8
ndash8
ndash6
ndash6
ndash4
ndash4
ndash2
ndash2
i ii iii iv v
vi vii vii ix x
x xii xiii xiv xv
xvi xvii xviii xix xx
Figure 4 Labelling of the cell boxes of AGCTG and TCAG
International Journal of Mathematics and Mathematical Sciences 5
(4) For each cell box the value of the preceding bottomvalue is seen to be always less than the value of theimmediate next adjacent diagonal value
416 Traceback and Alignment for Unequal CharacterLength )e traceback and alignment stages for the exampleon unequal character length of sequences are respectivelyshown below Figure 5 shows the pivot of each cell boxcalculation of the score matrix table of AGCTG and TCAGof Table 3 Based on the traceback and the direction of thearrow pointers the alignment is written as
AGCTG
T minus CAG(12)
We check the correctness of the alignment done usingcalculations Recall the scoring scheme match+5 mis-match minus1 and gap minus2 We have two alignment matchesG minus G and C minus C two alignment mismatches A T and T Aand one gapped alignment G minus Hence
minus1 minus 2 + 5 minus 1 + 5 6 (13)
which is the same as the optimal value from the score matrixtable )e alignment is thus optimal
42 Propositions and Proofs for Equal Sequence Length)e following propositions are a priori results deduced fromthe pattern results of equal character length of sequences
Definition 3 Let the linear gap penalty be v(k) gk whereg is a constant of minus2 and k is the sequence length count ofk 1 μ1 for the score matrix row and k 1 μ2 forthe score matrix column
Definition 4 Define P(0 0) 0 to be the initialization pivotfor the score matrix
Definition 5 Define Pm(i 0) Pm(i minus 1 0) + gap penaltyforalli 1 μ1 to be the initial row pivot for the score matrix
Definition 6 Define Pm(0 j) Pm(0 j minus 1) + gap penaltyforallj 1 μ2 to be the initial column pivot for the scorematrix
Definition 7 Define
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
(14)
foralli 1 μ1 and forallj 1 μ2 where Pm(i j) is the pivotfor each cell box calculation PD(i j) Pm(i minus 1 j minus 1) +
σ(α(i) β(j)) is the diagonal value of a cell box PR(i j)
Pm(i minus 1 j)+ gap penalty is the right value of a cell boxPL(i j) Pm(i j minus 1)+ gap penalty is the left value of a cellbox and σ(α(i) β(j)) is the score for aligning the sequencecharacters of α(i) and β(j)
Definition 8 Define
σ(α(i) β(j)) +5 for α(i) β(j) charactermatch
minus1 for α(i) β(j) charactermismatch1113896
(15)foralli 1 μ1 and forallj 1 μ2
Proposition 2 Filled-in cell boxes for Pm(1 j)forallj
1 μ2 are analogous to filled-in cell boxes for Pm(i 1)foralli
1 μ1 for equal length comparison of sequences
Proof By Definition 7 we have
Pm(1 j) max
PD(1 j) Pm(0 j minus 1) + σ(α(1) β(j))
PR(1 j) Pm(0 j) + gap
PL(1 j) Pm(1 j minus 1) + gap
⎧⎪⎪⎨
⎪⎪⎩
Pm(i 1) max
PD(i 1) Pm(i minus 1 0) + σ(α(i) β(1))
PR(i 1) Pm(i minus 1 1) + gap
PL(i 1) Pm(i 0) + gap
⎧⎪⎪⎨
⎪⎪⎩
(16)
TCA G G
T
C
A
G
0
0
0
ndash2 ndash4
ndash3 ndash5
ndash3
ndash3
ndash4
ndash6
ndash6 ndash8 ndash10
ndash1ndash1
ndash1
ndash1 ndash1
ndash8
ndash2
ndash2 ndash2
1 1
2
246 6
Figure 5 Traceback in the score matrix of AGCTG and TCAG
Table 3 Filled-in table of AGCTG and TCAGi DV minus1 LV minus4 RV minus4 Pivot minus1ii DV minus3 LV minus3 RV minus6 Pivot minus3iii DV minus5 LV minus5 RV minus8 Pivot minus5iv DV minus1 LV minus7 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV minus3 Pivot minus3vii DV minus2 LV minus5 RV minus5 Pivot minus2viii DV 2 LV minus4 RV minus7 Pivot 2ix DV minus6 LV 0 RV minus3 Pivot 0x DV minus2 LV minus2 RV minus5 Pivot minus2xi DV 1 LV minus8 RV minus5 Pivot 1xii DV minus4 LV minus1 RV minus4 Pivot minus1xiii DV minus3 LV minus3 RV 0 Pivot 0xiv DV 1 LV minus2 RV minus2 Pivot minus2xv DV minus1 LV minus1 RV minus4 Pivot minus1xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 6 LV minus3 RV minus3 Pivot 6xviii DV minus2 LV 4 RV minus2 Pivot 4xix DV minus1 LV 2 RV minus1 Pivot 2xx DV 6 LV 0 RV minus3 Pivot 6
6 International Journal of Mathematics and Mathematical Sciences
foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6
Pm(i 0) Pm(i minus 1 0) + gap
⟹PL(i 1) Pm(i 0) + gap
Pm(i minus 1 0) + gap( 1113857 + gap
(17)
also
Pm(0 j) Pm(0 j minus 1) + gap
⟹PR(1 j) Pm(0 j) + gap
Pm(0 j minus 1) + gap( 1113857 + gap
(18)
Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4Pm(i 0) Pm(0 j)
(19)
Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof
Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)
Pm(i minus 1 1) + gap and by induction let i 1 j then
PL(1 j) Pm(1 0) + gap
P(0 0) + gap + gap
P(0 0) + 2 gaps
PR(i 1) Pm(0 1) + gap
P(0 0) + gap + gap
P(0 0) + 2gaps
(20)
Assume Pm(k 0) Pm(0 k) then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4PR(i 1) PL(1 j) where i j
(21)
which completes the second part of our proofRecall
PL(i 1) Pm(i minus 1 0) + 2 gaps
PR(1 j) Pm(0 j minus 1) + 2 gaps
PL(i 1) PR(1 j)was established
(22)
It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)
Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))
For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)
and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us
σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)
Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)
Pm(0 j minus 1)
Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa
Proof From Proposition 2 and Definitions 5ndash7 it is clearthat
PL(i 1) PR(1 j)
PR(i 1) PL(1 j)
where i j
(24)
Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences
Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2
Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1
(25)
and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved
Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7
International Journal of Mathematics and Mathematical Sciences 7
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j) max
PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))
PR(i + 1 j) Pm(i j) + gap penalty
PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i j + 1) max
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))
PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j + 1) max
PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))
PR(i + 1 j + 1) Pm(i j + 1) + gap penalty
PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
PR(i + 1 j) Pm(i j) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
(26)
Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write
PR(i j) Pm(i minus 1 j) + gap penalty
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)
Weare left to show that gap penalty lt σ(α(i) β(j+ 1))
Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match
For any k count forallk 1 μ the linear gap penalty gk
decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us
gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score
match diagonal score
foralli j 1 μ
(28)
Hence PR(i j)ltPD(i j + 1)foralli j 1 μ
43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences
Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1
j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof
Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof
8 International Journal of Mathematics and Mathematical Sciences
5 Conclusion
In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs
Data Availability
No data were used to support this study
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
References
[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005
[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003
[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007
[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005
[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970
[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008
[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005
[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981
[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004
[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002
[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957
[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004
[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018
[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication
company using prometheerdquo Applied Informatics vol 5 no 12018
[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019
[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010
[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003
[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012
[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017
[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009
[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986
International Journal of Mathematics and Mathematical Sciences 9
(4) For each cell box the value of the preceding bottomvalue is seen to be always less than the value of theimmediate next adjacent diagonal value
416 Traceback and Alignment for Unequal CharacterLength )e traceback and alignment stages for the exampleon unequal character length of sequences are respectivelyshown below Figure 5 shows the pivot of each cell boxcalculation of the score matrix table of AGCTG and TCAGof Table 3 Based on the traceback and the direction of thearrow pointers the alignment is written as
AGCTG
T minus CAG(12)
We check the correctness of the alignment done usingcalculations Recall the scoring scheme match+5 mis-match minus1 and gap minus2 We have two alignment matchesG minus G and C minus C two alignment mismatches A T and T Aand one gapped alignment G minus Hence
minus1 minus 2 + 5 minus 1 + 5 6 (13)
which is the same as the optimal value from the score matrixtable )e alignment is thus optimal
42 Propositions and Proofs for Equal Sequence Length)e following propositions are a priori results deduced fromthe pattern results of equal character length of sequences
Definition 3 Let the linear gap penalty be v(k) gk whereg is a constant of minus2 and k is the sequence length count ofk 1 μ1 for the score matrix row and k 1 μ2 forthe score matrix column
Definition 4 Define P(0 0) 0 to be the initialization pivotfor the score matrix
Definition 5 Define Pm(i 0) Pm(i minus 1 0) + gap penaltyforalli 1 μ1 to be the initial row pivot for the score matrix
Definition 6 Define Pm(0 j) Pm(0 j minus 1) + gap penaltyforallj 1 μ2 to be the initial column pivot for the scorematrix
Definition 7 Define
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
(14)
foralli 1 μ1 and forallj 1 μ2 where Pm(i j) is the pivotfor each cell box calculation PD(i j) Pm(i minus 1 j minus 1) +
σ(α(i) β(j)) is the diagonal value of a cell box PR(i j)
Pm(i minus 1 j)+ gap penalty is the right value of a cell boxPL(i j) Pm(i j minus 1)+ gap penalty is the left value of a cellbox and σ(α(i) β(j)) is the score for aligning the sequencecharacters of α(i) and β(j)
Definition 8 Define
σ(α(i) β(j)) +5 for α(i) β(j) charactermatch
minus1 for α(i) β(j) charactermismatch1113896
(15)foralli 1 μ1 and forallj 1 μ2
Proposition 2 Filled-in cell boxes for Pm(1 j)forallj
1 μ2 are analogous to filled-in cell boxes for Pm(i 1)foralli
1 μ1 for equal length comparison of sequences
Proof By Definition 7 we have
Pm(1 j) max
PD(1 j) Pm(0 j minus 1) + σ(α(1) β(j))
PR(1 j) Pm(0 j) + gap
PL(1 j) Pm(1 j minus 1) + gap
⎧⎪⎪⎨
⎪⎪⎩
Pm(i 1) max
PD(i 1) Pm(i minus 1 0) + σ(α(i) β(1))
PR(i 1) Pm(i minus 1 1) + gap
PL(i 1) Pm(i 0) + gap
⎧⎪⎪⎨
⎪⎪⎩
(16)
TCA G G
T
C
A
G
0
0
0
ndash2 ndash4
ndash3 ndash5
ndash3
ndash3
ndash4
ndash6
ndash6 ndash8 ndash10
ndash1ndash1
ndash1
ndash1 ndash1
ndash8
ndash2
ndash2 ndash2
1 1
2
246 6
Figure 5 Traceback in the score matrix of AGCTG and TCAG
Table 3 Filled-in table of AGCTG and TCAGi DV minus1 LV minus4 RV minus4 Pivot minus1ii DV minus3 LV minus3 RV minus6 Pivot minus3iii DV minus5 LV minus5 RV minus8 Pivot minus5iv DV minus1 LV minus7 RV minus10 Pivot minus1v DV minus9 LV minus3 RV minus12 Pivot minus3vi DV minus3 LV minus6 RV minus3 Pivot minus3vii DV minus2 LV minus5 RV minus5 Pivot minus2viii DV 2 LV minus4 RV minus7 Pivot 2ix DV minus6 LV 0 RV minus3 Pivot 0x DV minus2 LV minus2 RV minus5 Pivot minus2xi DV 1 LV minus8 RV minus5 Pivot 1xii DV minus4 LV minus1 RV minus4 Pivot minus1xiii DV minus3 LV minus3 RV 0 Pivot 0xiv DV 1 LV minus2 RV minus2 Pivot minus2xv DV minus1 LV minus1 RV minus4 Pivot minus1xvi DV minus7 LV minus10 RV minus1 Pivot minus1xvii DV 6 LV minus3 RV minus3 Pivot 6xviii DV minus2 LV 4 RV minus2 Pivot 4xix DV minus1 LV 2 RV minus1 Pivot 2xx DV 6 LV 0 RV minus3 Pivot 6
6 International Journal of Mathematics and Mathematical Sciences
foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6
Pm(i 0) Pm(i minus 1 0) + gap
⟹PL(i 1) Pm(i 0) + gap
Pm(i minus 1 0) + gap( 1113857 + gap
(17)
also
Pm(0 j) Pm(0 j minus 1) + gap
⟹PR(1 j) Pm(0 j) + gap
Pm(0 j minus 1) + gap( 1113857 + gap
(18)
Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4Pm(i 0) Pm(0 j)
(19)
Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof
Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)
Pm(i minus 1 1) + gap and by induction let i 1 j then
PL(1 j) Pm(1 0) + gap
P(0 0) + gap + gap
P(0 0) + 2 gaps
PR(i 1) Pm(0 1) + gap
P(0 0) + gap + gap
P(0 0) + 2gaps
(20)
Assume Pm(k 0) Pm(0 k) then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4PR(i 1) PL(1 j) where i j
(21)
which completes the second part of our proofRecall
PL(i 1) Pm(i minus 1 0) + 2 gaps
PR(1 j) Pm(0 j minus 1) + 2 gaps
PL(i 1) PR(1 j)was established
(22)
It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)
Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))
For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)
and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us
σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)
Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)
Pm(0 j minus 1)
Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa
Proof From Proposition 2 and Definitions 5ndash7 it is clearthat
PL(i 1) PR(1 j)
PR(i 1) PL(1 j)
where i j
(24)
Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences
Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2
Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1
(25)
and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved
Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7
International Journal of Mathematics and Mathematical Sciences 7
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j) max
PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))
PR(i + 1 j) Pm(i j) + gap penalty
PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i j + 1) max
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))
PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j + 1) max
PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))
PR(i + 1 j + 1) Pm(i j + 1) + gap penalty
PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
PR(i + 1 j) Pm(i j) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
(26)
Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write
PR(i j) Pm(i minus 1 j) + gap penalty
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)
Weare left to show that gap penalty lt σ(α(i) β(j+ 1))
Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match
For any k count forallk 1 μ the linear gap penalty gk
decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us
gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score
match diagonal score
foralli j 1 μ
(28)
Hence PR(i j)ltPD(i j + 1)foralli j 1 μ
43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences
Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1
j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof
Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof
8 International Journal of Mathematics and Mathematical Sciences
5 Conclusion
In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs
Data Availability
No data were used to support this study
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
References
[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005
[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003
[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007
[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005
[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970
[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008
[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005
[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981
[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004
[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002
[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957
[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004
[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018
[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication
company using prometheerdquo Applied Informatics vol 5 no 12018
[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019
[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010
[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003
[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012
[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017
[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009
[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986
International Journal of Mathematics and Mathematical Sciences 9
foralli j 1 μ since μ1 μ2 μ Again by Definitions 5and 6
Pm(i 0) Pm(i minus 1 0) + gap
⟹PL(i 1) Pm(i 0) + gap
Pm(i minus 1 0) + gap( 1113857 + gap
(17)
also
Pm(0 j) Pm(0 j minus 1) + gap
⟹PR(1 j) Pm(0 j) + gap
Pm(0 j minus 1) + gap( 1113857 + gap
(18)
Using Definitions 5 and 6 and applying inductionPm(1 0) P(0 0) + gap and Pm(0 1) P(0 0) + gap thusPm(1 0) Pm(0 1) is true Assume Pm(k 0) Pm(0 k) isalways true then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4Pm(i 0) Pm(0 j)
(19)
Recall that the gap is constant so we writePL(i 1) Pm(i 0) + gap PR(1 j) Pm(0 j) + gap thusPL(i 1) PR(1 j) which completes the first part of ourproof
Again from PL(1 j) Pm(1 j minus 1) + gap PR(i 1)
Pm(i minus 1 1) + gap and by induction let i 1 j then
PL(1 j) Pm(1 0) + gap
P(0 0) + gap + gap
P(0 0) + 2 gaps
PR(i 1) Pm(0 1) + gap
P(0 0) + gap + gap
P(0 0) + 2gaps
(20)
Assume Pm(k 0) Pm(0 k) then
Pm(k + 1 0) Pm(k 0) + gap
Pm(0 k) + gap
Pm(0 k + 1)
there4PR(i 1) PL(1 j) where i j
(21)
which completes the second part of our proofRecall
PL(i 1) Pm(i minus 1 0) + 2 gaps
PR(1 j) Pm(0 j minus 1) + 2 gaps
PL(i 1) PR(1 j)was established
(22)
It follows that Pm(i minus 1 0) Pm(0 j minus 1) is obvious)us PD(1 j) Pm(0 j minus1) +σ(α(1)β(j)) and PD(i1)
Pm(i minus10) +σ(α(i)β(1)) We are left to show thatσ(α(1)β(j)) σ(α(i)β(1))
For i j 1 σ(α(1) β(j)) σ(α(i) β(1)) since α(1)
and β(1) match For i jne 1 α(1) and β(j) mismatch andα(i) and β(1) also mismatch )us
σ(α(1) β(j)) σ(α(i) β(1)) foralli j isin 1 μ (23)
Hence PD(1 j) PD(i 1) since Pm(i minus 1 0)
Pm(0 j minus 1)
Corollary 3 0e right value of filled-in cell boxes forPm(1 j)forallj 1 μ2 becomes the left value of filled-in cellboxes for Pm(i 1)foralli 1 μ1 and vice versa
Proof From Proposition 2 and Definitions 5ndash7 it is clearthat
PL(i 1) PR(1 j)
PR(i 1) PL(1 j)
where i j
(24)
Corollary 4 0e pivot values Pm(1 j)forallj 1 μ2 andPm(i 1)foralli 1 μ1 are the same for comparison of equallength of sequences
Proof By Definition 7Pm(1 j) max PD(1 j) PR(1 j) PL(1 j)1113864 1113865 forallj 1 μ2
Pm(i 1) max PD(i 1) PR(i 1) PL(i 1)1113864 1113865 foralli 1 μ1
(25)
and by Proposition 2 we can write that Pm(1 j) Pm(i 1)Hence it is proved
Proposition 5 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) andPm(i + 1 j + 1) be any four cell boxes then we are to showthat PL(i j + 1) PR(i + 1 j) By Definition 7
International Journal of Mathematics and Mathematical Sciences 7
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j) max
PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))
PR(i + 1 j) Pm(i j) + gap penalty
PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i j + 1) max
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))
PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j + 1) max
PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))
PR(i + 1 j + 1) Pm(i j + 1) + gap penalty
PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
PR(i + 1 j) Pm(i j) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
(26)
Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write
PR(i j) Pm(i minus 1 j) + gap penalty
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)
Weare left to show that gap penalty lt σ(α(i) β(j+ 1))
Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match
For any k count forallk 1 μ the linear gap penalty gk
decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us
gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score
match diagonal score
foralli j 1 μ
(28)
Hence PR(i j)ltPD(i j + 1)foralli j 1 μ
43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences
Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1
j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof
Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof
8 International Journal of Mathematics and Mathematical Sciences
5 Conclusion
In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs
Data Availability
No data were used to support this study
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
References
[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005
[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003
[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007
[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005
[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970
[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008
[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005
[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981
[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004
[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002
[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957
[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004
[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018
[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication
company using prometheerdquo Applied Informatics vol 5 no 12018
[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019
[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010
[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003
[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012
[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017
[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009
[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986
International Journal of Mathematics and Mathematical Sciences 9
Pm(i j) max
PD(i j) Pm(i minus 1 j minus 1) + σ(α(i) β(j))
PR(i j) Pm(i minus 1 j) + gap penalty
PL(i j) Pm(i j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j) max
PD(i + 1 j) Pm(i j minus 1) + σ(α(i + 1) β(j))
PR(i + 1 j) Pm(i j) + gap penalty
PL(i + 1 j) Pm(i + 1 j minus 1) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i j + 1) max
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))
PR(i j + 1) Pm(i minus 1 j + 1) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
Pm(i + 1 j + 1) max
PD(i + 1 j + 1) Pm(i j) + σ(α(i + 1) β(j + 1))
PR(i + 1 j + 1) Pm(i j + 1) + gap penalty
PL(i + 1 j + 1) Pm(i + 1 j) + gap penalty
⎧⎪⎪⎨
⎪⎪⎩
PR(i + 1 j) Pm(i j) + gap penalty
PL(i j + 1) Pm(i j) + gap penalty
(26)
Since the linear gap penalty is equal in both cases and it isobvious that Pm(i j) is the same in both cases we writePL(i j + 1) PR(i + 1 j) Corollary 6 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Let Pm(i j) and Pm(i j + 1) be any two adjacent cellboxes )en by Proposition 5 we write
PR(i j) Pm(i minus 1 j) + gap penalty
PD(i j + 1) Pm(i minus 1 j) + σ(α(i) β(j + 1))(27)
Weare left to show that gap penalty lt σ(α(i) β(j+ 1))
Remark 5 By Altschulrsquos theory a match and mismatch arechosen to be greater than a gap penaltyWe recall the scoringscheme of the constant g being ldquominus2rdquo in the linear gap penaltygk and the diagonal score being ldquominus1rdquo when there is amismatch and ldquo+5rdquo when there is a match
For any k count forallk 1 μ the linear gap penalty gk
decreases and thus the gap penalty can never be greater thanor equal to any of the diagonal scores allocated for match andmismatch )us
gap penalty σ(α(i) β(j + 1))1113980radicradicradicradicradicradicradic11139791113978radicradicradicradicradicradicradic1113981mismatch diagonal score
match diagonal score
foralli j 1 μ
(28)
Hence PR(i j)ltPD(i j + 1)foralli j 1 μ
43 Proposition and Proof for Unequal Sequence Length)e following proposition holds a priori from the patternresults of unequal character length of sequences We stateand prove the following general results by adhering to thesame definitions stated earlier under the equal characterlength of sequences
Proposition 7 For each three pointed arrows intersectingany four cell boxes the left value of a 2nd cell box correspondsto the right value of a 3rd cell box
Proof Let Pm(i j) Pm(i + 1 j) Pm(i j + 1) and Pm(i + 1
j + 1) be any four cell boxes then we are to show thatPm(i + 1 j) Pm(i j + 1) Refer to the proof of Proposition5 Despite the unequal sequence length of μ1 ne μ2 thesupposed disparity in length has no bearing on theproof
Corollary 8 For any two adjacent row cell boxes the rightvalue of a preceding cell box is less than the diagonal value ofthe next cell box
Proof Refer to the proof of Corollary 6 Again despite theunequal sequence length of μ1 ne μ2 the supposed disparity inlength has no bearing on the proof
8 International Journal of Mathematics and Mathematical Sciences
5 Conclusion
In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs
Data Availability
No data were used to support this study
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
References
[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005
[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003
[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007
[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005
[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970
[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008
[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005
[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981
[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004
[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002
[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957
[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004
[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018
[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication
company using prometheerdquo Applied Informatics vol 5 no 12018
[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019
[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010
[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003
[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012
[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017
[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009
[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986
International Journal of Mathematics and Mathematical Sciences 9
5 Conclusion
In this paper the score matrix of the NeedlemanndashWunschalgorithm was exploited for possible patterns Given any twoarbitrary sequences of equal or unequal length a generalpattern was formulated as new a priori propositions andcorollaries )ese new formulated propositions and corol-laries are justified with their corresponding proofs
Data Availability
No data were used to support this study
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
References
[1] J C Bare 0e Evolution of Sequence Comparison AlgorithmsUniversity of Washington Washington DC USA 2005
[2] S A de Carvalho ldquoSequence alignment algorithmsrdquo Masterrsquosthesis Kings College London UK 2003
[3] S R McAllister R Rajgaria and C A Floudas ldquoGlobalpairwise sequence alignment through mixed-integer linearprogramming a template-free approachrdquo OptimisationMethods and Software vol 22 no 1 pp 127ndash144 2007
[4] S Batzoglou ldquo)e many faces of sequence alignmentrdquoBriefings in Bioinformatics vol 6 no 1 pp 6ndash22 2005
[5] S B Needleman and C D Wunsch ldquoA general methodapplicable to the search for similarities in the amino acidsequence of two proteinsrdquo Journal of Molecular Biologyvol 48 no 3 pp 443ndash453 1970
[6] V Likic ldquo)e needleman-wunsch algorithm for sequencealignmentrdquo Lecture given at the 7th Melbourne BioinformaticsCourse pp 1ndash46 Bi021 Molecular Science and BiotechnologyInstitute University of Melbourne Melbourne Australia2008
[7] T Naveed I S Siddiqui and S Ahmed Parallel Needle-manmdashWunsch Algorithm for Grid Bahria UniversityIslamabad Pakistan 2005
[8] T F Smith M S Waterman and W M Fitch ldquoComparativebiosequence metricsrdquo Journal of Molecular Evolution vol 18no 1 pp 38ndash46 1981
[9] S R Eddy ldquoWhat is dynamic programmingrdquo Nature Bio-technology vol 22 no 7 pp 909-910 2004
[10] S Dreyfus ldquoRichard bellman on the birth of dynamic pro-grammingrdquo Operations Research vol 50 no 1 pp 48ndash512002
[11] R Bellman Dynamic Programming Princeton UniversityPress Long Island NY USA 1957
[12] D W Mount Sequence and Genome Analysis Cold SpringHarbor Laboratory Press Cold Spring Harbor NY USA 2ndedition 2004
[13] A Donkor K Darkwah J Appati P Gakpleazi andW Naninja ldquoOptimal allocation of funds for loans usingkarmarkarrsquos algorithm capital rural bank sunyani-ghanardquoJournal of Economics Management and Trade vol 12pp 1ndash13 2018
[14] B Boatemaa J K Appati and K F Darkwah ldquoMulti-criteriaranking of voice transmission carriers of a telecommunication
company using prometheerdquo Applied Informatics vol 5 no 12018
[15] K Gyimah J K Appati K Darkwah and K Ansah ldquoAnimproved geo-textural based feature extraction vector foroffline signature verificationrdquo Journal of Advances in Math-ematics and Computer Science vol 32 no 2 pp 1ndash14 2019
[16] B Bhowmik ldquoDynamic programming-its principles appli-cations strengths and limitationsrdquo International Journal ofEngineering Science and Technology vol 2 no 7 2010
[17] X Huang and K-M Chao ldquoA generalized global alignmentalgorithmrdquo Bioinformatics vol 19 no 2 pp 228ndash233 2003
[18] Amrita Global Alignment of Two SequencesmdashNeedleman-Wunsch Algorithm 2012
[19] C A Ouzounis and S Baichoo ldquoComputational complexityof algorithms for sequence comparison short-read assemblyand genome alignmentrdquo Biosystems vol 156-157 pp 72ndash852017
[20] R C Edgar ldquoOptimizing substitution matrix choice and gapparameters for sequence alignmentrdquo Bmc Bioinformaticsvol 10 no 1 p 396 2009
[21] S F Altschul and B W Erickson ldquoOptimal sequencealignment using affine gap costsrdquo Bulletin of MathematicalBiology vol 48 no 5-6 pp 603ndash616 1986
International Journal of Mathematics and Mathematical Sciences 9