longest common subsequence (lcs)longest common subsequence (lcs) • a subsequence of a string is...

24
Longest Common Subsequence (LCS) A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: ABC, ABCD, B, ABCBDA… DBC A common subsequence of two strings is a subsequence common to both strings. X = ABCBDAB, Y = BDCABA Common subsequences: AA, BC, BCA … A LCS of two strings is a common subsequence with maximal length. X = ABCBDAB, Y = BDCABA LCS ?

Upload: others

Post on 31-May-2020

35 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

Longest Common Subsequence (LCS)

• A subsequence of a string is the string with zero or more chars left oute.g., X = ABCBDABSubsequences: ABC, ABCD, B, ABCBDA…DBC

• A common subsequence of two strings is a subsequence common to both strings.

X = ABCBDAB, Y = BDCABACommon subsequences: AA, BC, BCA …

• A LCS of two strings is a common subsequence with maximal length.

X = ABCBDAB, Y = BDCABALCS ?

Page 2: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

LCS problem

Given two strings X and Y, of length m and n, find• The length of an LCS of X and Y • An actual LCS

Solution 1: Brute-force algorithm• check all subsequence of X to see if it is also a subsequences of Y• Not efficient

Better solution?

Page 3: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

LCS problem

Solution 2: Dynamic programming (DP)

Key: optimal substructure and overlapping sub-problemsFind the length of LCS first, then identify LCS itself

Page 4: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

Algorithm design of QUBIC2.0

DP Algorithm

http://slideplayer.com/slide/5360281/

Let 𝑐 𝑖, 𝑗 be the length of LCS(𝑥 1. . 𝑖 , 𝑦 1. . 𝑗 )

𝑐 𝑖, 𝑗 = ቊ𝑐 𝑖 − 1, 𝑗 − 1 + 1, 𝑖𝑓 𝑥 𝑖 = 𝑦[𝑗]

max 𝑐 𝑖 − 1, 𝑗 , 𝑐 𝑖, 𝑗 − 1 , 𝑜𝑡ℎ𝑒𝑟 𝑤𝑖𝑠𝑒

Page 5: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

Algorithm design of QUBIC2.0

LCS example

We’ll see how LCS algorithm works on the following example:

• X = ABCB

• Y = BDCAB

What is the LCS of X and Y?

Page 6: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

6

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

X = ABCB; m = |X| = 4Y = BDCAB; n = |Y| = 5Allocate array c[5,4]

ABCBBDCAB

Page 7: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

7

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

for i = 1 to m c[i,0] = 0 for j = 1 to n c[0,j] = 0

ABCBBDCAB

Page 8: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

8

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

if ( Xi == Yj )c[i,j] = c[i-1,j-1] + 1

else c[i,j] = max( c[i-1,j], c[i,j-1] )

0

ABCBBDCAB

Page 9: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

9

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

if ( Xi == Yj )c[i,j] = c[i-1,j-1] + 1

else c[i,j] = max( c[i-1,j], c[i,j-1] )

0 0 0

ABCBBDCAB

Page 10: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

10

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

if ( Xi == Yj )c[i,j] = c[i-1,j-1] + 1

else c[i,j] = max( c[i-1,j], c[i,j-1] )

0 0 0 1

ABCBBDCAB

Page 11: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

11

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

if ( Xi == Yj )c[i,j] = c[i-1,j-1] + 1

else c[i,j] = max( c[i-1,j], c[i,j-1] )

000 1 1

ABCBBDCAB

Page 12: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

12

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

if ( Xi == Yj )c[i,j] = c[i-1,j-1] + 1

else c[i,j] = max( c[i-1,j], c[i,j-1] )

0 0 10 1

1

ABCBBDCAB

Page 13: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

13

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

if ( Xi == Yj )c[i,j] = c[i-1,j-1] + 1

else c[i,j] = max( c[i-1,j], c[i,j-1] )

1000 1

1 1 11

ABCBBDCAB

Page 14: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

14

LCS Examplej 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

if ( Xi == Yj )c[i,j] = c[i-1,j-1] + 1

else c[i,j] = max( c[i-1,j], c[i,j-1] )

1000 1

1 1 1 1 2

ABCBBDCAB

Page 15: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

15

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

if ( Xi == Yj )c[i,j] = c[i-1,j-1] + 1

else c[i,j] = max( c[i-1,j], c[i,j-1] )

1000 1

21 1 11

1 1

ABCBBDCAB

Page 16: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

16

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

if ( Xi == Yj )c[i,j] = c[i-1,j-1] + 1

else c[i,j] = max( c[i-1,j], c[i,j-1] )

1000 1

1 21 11

1 1 2

ABCBBDCAB

Page 17: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

17

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

if ( Xi == Yj )c[i,j] = c[i-1,j-1] + 1

else c[i,j] = max( c[i-1,j], c[i,j-1] )

1000 1

1 21 1

1 1 2

1

22

ABCBBDCAB

Page 18: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

18

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

if ( Xi == Yj )c[i,j] = c[i-1,j-1] + 1

else c[i,j] = max( c[i-1,j], c[i,j-1] )

1000 1

1 21 1

1 1 2

1

22

1

ABCBBDCAB

Page 19: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

19

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

if ( Xi == Yj )c[i,j] = c[i-1,j-1] + 1

else c[i,j] = max( c[i-1,j], c[i,j-1] )

1000 1

1 21 1

1 1 2

1

22

1 1 2 2

ABCBBDCAB

Page 20: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

20

LCS Example

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

B

Yj BB ACD

0

0

00000

0

0

0

if ( Xi == Yj )c[i,j] = c[i-1,j-1] + 1

else c[i,j] = max( c[i-1,j], c[i,j-1] )

1000 1

1 21 1

1 1 2

1

22

1 1 2 2 3

ABCBBDCAB

Page 21: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

Algorithm design of QUBIC2.0

How to find actual LCS

• So far, we have just found the length of LCS, but not LCS itself.

• We want to modify this algorithm to make it output LCS of X and Y

Each 𝑐 𝑖, 𝑗 depends on 𝑐 𝑖 − 1, 𝑗 and 𝑐 𝑖, 𝑗 − 1 or 𝑐 𝑖 − 1, 𝑗 − 1 .

For each c[i,j] we can say how it was acquired:

2

2 3

2 For example, here c[i,j] = c[i-1,j-1] +1 = 2+1=3

Page 22: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

Algorithm design of QUBIC2.0

• Remember that

𝑐 𝑖, 𝑗 = ቊ𝑐 𝑖 − 1, 𝑗 − 1 + 1, 𝑖𝑓 𝑥 𝑖 = 𝑦[𝑗]

max 𝑐 𝑖 − 1, 𝑗 , 𝑐 𝑖, 𝑗 − 1 , 𝑜𝑡ℎ𝑒𝑟 𝑤𝑖𝑠𝑒

So we can start from c[m,n] and go backwards

Whenever 𝑐 𝑖, 𝑗 = 𝑐 𝑖 − 1, 𝑗 − 1 + 1, record 𝑥 𝑖

When i=0 or j=0 (i.e. we reached the beginning), output recorded letters in reverse order

Page 23: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

23

Finding LCS

j 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

Yj BB ACD

0

0

00000

0

0

0

1000 1

1 21 1

1 1 2

1

22

1 1 2 2 3B

To construct LCS, start in the bottom right corner and follow the arrows. indicates a matching character

Page 24: Longest Common Subsequence (LCS)Longest Common Subsequence (LCS) • A subsequence of a string is the string with zero or more chars left out e.g., X = ABCBDAB Subsequences: A, AD,

8/22/2017 24

Finding LCSj 0 1 2 3 4 5

0

1

2

3

4

i

Xi

A

B

C

Yj BB ACD

0

0

00000

0

0

0

1000 1

1 21 1

1 1 2

1

22

1 1 2 2 3B

B C BLCS (reversed order):

LCS (straight order): B C B