theory of computation (fall 2014): cocke-younger-kasami algorithm

37
Theory of Computation Cocke-Younger-Kasami Algorithm Vladimir Kulyukin

Upload: vladimir-kulyukin

Post on 14-Dec-2014

58 views

Category:

Science


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Theory of Computation

Cocke-Younger-Kasami Algorithm

Vladimir Kulyukin

Page 2: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Outline

CFL Acceptance Problem Basic Insight Dynamic Programming Implementation

www.vkedco.blogsot.com

Page 3: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

CYK Algorithm’s Problem

Problem: Given a CFG G = (V, T, P, S) and a string x in T*, determine if x is in L(G)?

The Cocke-Younger-Kasami (CYK) algorithm takes a CFG in CNF and a string and determines if S is one of the symbols that derive x

Page 4: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Substring Notation xsl

Let x be a string such that |x|= n ≥ 1 Let xsl be the substring of x of length l that starts at position s,

1≤ s ≤ n and 1≤ l ≤ n For example, if x = aabbabb, then x13 = aab = x[1]x[2]x[3] and

x24 = abba = x[2]x[3]x[4]x[5] In general, if we do 1-based array indexing and the length of

the substring is l, the last available position s at which the substring can start is n – l + 1

For example, if |x| = 4 and l = 2, the possible values for s in xs2 are 1, 2, and 3 = 4 – 2 + 1

Page 5: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

CYK Algorithm: Basic Insight

A

B C

xskx(s+k)(l-k)

s s+k s+ls+k-1

xsl

A * xsl iff 1) A BC;2) B * xsk;3) C * x(s+k)(l-k), for some k, 1 ≤ k < l

In other words, to determine if A * xsl there must be a rule A BC and some k, 1 ≤ k < l, for which B * xsk and C * x(s+k)(l-k).

Page 6: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Table D[s, l] CYK is a dynamic programming algorithm that,

given a CNF grammar G = (V, T, S, P) and a string x over a specific alphabet such that |x|= n > 0, incrementally builds a n x n table D (D stands for ‘derives’)

D[s, l] is a set, possibly empty, of symbols A in V such that A * xsl

In other words D[s, l] records all variables in G that derive xsl

Page 7: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Table D[s, l] CYK is a dynamic programming algorithm that,

given a CNF grammar G = (V, T, S, P) and a string x over a specific alphabet such that |x|= n > 0, incrementally builds a n x n table D (D stands for ‘derives’)

D[s, l] is a set, possibly empty, of symbols A in V such that A * xsl

In other words D[s, l] records all variables in G that derive xsl

Page 8: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

D[s, l] Initialization

Let G = (V, T, S, P) be a CNF grammar and x be a string such that |x|= n > 0,

Let xsl be the substring of x of length l that starts at position s

If l = 1, then, for each 1≤ s ≤ n, we can check if xs1 can be derived directly from some variable A of G

How? By checking if G has a production A xs1

Page 9: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

D[s, l] Initialization

Assume that our CNF grammar is as follows:1. S AB | BC2. A BA | a3. B CC | b4. C AB | a

Assume that the input is x = baaba What does D[s, l] look like?

Page 10: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

5 x 5 D[s, l]

s

1 2 3 4 5

1

2

3

4

5

l

Page 11: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Computing D[1,1]

The input is x = baaba The 1st symbol of the input is b Thus, D[1,1] = {A | A b}, where A is

in V There is only one production that

qualifies: B b So D[1,1] = {B}

G’s Productions:

1.S AB | BC

2.A BA | a

3.B CC | b

4.C AB | a

Page 12: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

D[s, l] So Far

{B}

s

1 2 3 4 5

1

2

3

4

5

l

Page 13: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Computing D[2,1]

The input is x = baaba The 2nd symbol of the input is a We compute {A | A a} , where A is in V There are two such productions: A a, C a

So D[2, 1] = {A,C}

G’s Productions:

1.S AB | BC

2.A BA | a

3.B CC | b

4.C AB | a

Page 14: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

D[s, l] So Far

{B} {A, C}

s

1 2 3 4 5

1

2

3

4

5

l

Page 15: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Computing D[3,1]

The input is x = baaba The 3rd symbol of the input is a We compute {A | A a} , where A is in V There are two such productions: A a, C a

So D[3, 1] = {A,C}

G’s Productions:

1.S AB | BC

2.A BA | a

3.B CC | b

4.C AB | a

Page 16: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

D[s, l] So Far

{B} {A, C} {A, C}

s

1 2 3 4 5

1

2

3

4

5

l

Page 17: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Computing D[4,1]

The input is x = baaba The 4th symbol of the input is b Thus, D[4,1] = {A | A b}, where A is

in V There is only one production that

qualifies: B b So D[4,1] = {B}

G’s Productions:

1.S AB | BC

2.A BA | a

3.B CC | b

4.C AB | a

Page 18: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

D[s, l] So Far

{B} {A, C} {A, C} {B}

s

1 2 3 4 5

1

2

3

4

5

l

Page 19: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Computing D[5,1]

The input is x = baaba The 5th symbol of the input is a We compute {A | A a} , where A is in V There are two such productions: A a and C a So D[5, 1] = {A,C}

G’s Productions:

1.S AB | BC

2.A BA | a

3.B CC | b

4.C AB | a

Page 20: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

D[s, l] So Far

{B} {A, C} {A, C} {B} {A, C}

s

1 2 3 4 5

1

2

3

4

5

l

Page 21: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Computing D[1,2]

We need to find k, such that 1 ≤ k < 2 and look for productions A BC where B is in D[1,1] and C is in D[2,1]

Since D[1,1] = {B} and D[2,1] = {A, C}, the possibilities for the right-hand sides are {B} x {A, C} = {BA, BC}

The rules that match these possibilities are S BC and A BA

So D[1,2] = {S,A}

G’s Productions:

1.S AB | BC

2.A BA | a

3.B CC | b

4.C AB | a

Page 22: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

D[s, l] So Far

{B} {A, C} {A, C} {B} {A, C}

{S, A}

s

1 2 3 4 5

1

2

3

4

5

l

Page 23: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Computing D[2,2] We need to find k, such that 1 ≤ k <

2, and the rules A BC, where B is in D[2,1] and C is in D[3,1]

Since D[2,1] = {A,C} = D[3,1] = {A,C}, the right-hand side possibilities are AA, AC, CA, CC

There is only one rule that qualifies: B CC

So D[2,2] = {B}

G’s Productions:

1.S AB | BC

2.A BA | a

3.B CC | b

4.C AB | a

Page 24: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

D[s, l] So Far

{B} {A, C} {A, C} {B} {A, C}

{S, A} {B}

s

1 2 3 4 5

1

2

3

4

5

l

Page 25: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Computing D[3,2]

We look for k, such that 1 ≤ k < 2 and rules of the form A BC, where B is in D[3,1] and C is in D[4,1]

D[3,1] = {A,C} and D[4,1] = {B} So the right-hand side (RHS) possibilities

are AB, CB The rules whose RHS’s that match these

possibilities are: S AB and C AB So D[3,2] = {S,C}

G’s Productions:

1.S AB | BC

2.A BA | a

3.B CC | b

4.C AB | a

Page 26: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

D[s, l] So Far

{B} {A, C} {A, C} {B} {A, C}

{S, A} {B} {S, C}

s

1 2 3 4 5

1

2

3

4

5

l

Page 27: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Computing D[4,2]

We look for k, such that 1 ≤ k < 2 and rules of the form A BC, where B is D[4,1] and C is in D[5,1]

V[4,1] = {B}; V[5,1] = {A,C} So the RHS possibilities are BA and BC The rules whose RHS’s that match these

possibilities are: S BC and A BA So D[4,2] = {S,A}

G’s Productions:

1.S AB | BC

2.A BA | a

3.B CC | b

4.C AB | a

Page 28: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

D[s, l] So Far

{B} {A, C} {A, C} {B} {A, C}

{S, A} {B} {S, C} {S, A}

s

1 2 3 4 5

1

2

3

4

5

l

Page 29: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Computing D[1,3] We look for k, such that 1 ≤ k < 3 and rules

of the form A BC, where, for k = 1, B is in D[1,1] and C is in D[2,2] or where, for k = 2, B is in D[1,2] and C is in D[3,1]

For k = 1, D[1,1] = {B} and D[2,2] = {B}, so there is only one right-hand side possibility: BB

The grammar does not have any productions whose right-hand side is BB

For k = 2, D[1,2] = {S,A} and D[3,1] = {A,C}, so the RHS possibilities are: SA, SC, AA, AC

The grammar does not have any productions whose RHS’s are SA, SC, AA, AC

So D[1,3] = { }

G’s Productions:

1.S AB | BC

2.A BA | a

3.B CC | b

4.C AB | a

Page 30: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

D[s, l] So Far

{B} {A, C} {A, C} {B} {A, C}

{S, A} {B} {S, C} {S, A}

{ }

s

1 2 3 4 5

1

2

3

4

5

l

Page 31: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Computing D[2,3] We look for k, such that 1 ≤ k < 3 and rules

of the form A BC, where, if k = 1, B is in D[2,1] and C is in D[3,2] or where, if k = 2, B is in D[2,2] and C is in D[4,1]

For k = 1, D[2,1] = {A,C} and D[3,2] = {S,C}

The RHS possibilities are: AS, AC, CS, CC The only rule that matches is B CC For k = 2, D[2,2] = {B} and D[4,1] = {B} The possibilities are: BB No rules match So D[2,3] = {B}

G’s Productions:

1.S AB | BC

2.A BA | a

3.B CC | b

4.C AB | a

Page 32: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

D[s, l] So Far

{B} {A, C} {A, C} {B} {A, C}

{S, A} {B} {S, C} {S, A}

{ } {B}

s

1 2 3 4 5

1

2

3

4

5

l

Page 33: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Rest of D[s, l]

{B} {A, C} {A, C} {B} {A, C}

{S, A} {B} {S, C} {S, A}

{ } {B} {B}

{ } {S, A, C}

{S, A, C}

s

1 2 3 4 5

1

2

3

4

5

l

Page 34: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

Is x=baaba Accepted?

Yes, because D[1,5] contains S. It means that S * xsl. In other words, the substring of x that starts at 1 and has a length of 5 is derivable from S.

Page 35: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

CYK Algorithm: Pseudocode// Inputs are a string x such that |x| ≥ 1 and a CNF grammar G with no ε-productionsCYK(String x, CNFGrammar G) {

create a n x n table D, where n = |x|;for s from 1 upto n {

D[s, 1] = {A | A → a is in G and a = x[i], i.e., a is the i-th symbol of x}; }

for l from 2 upto n { // l are all possible substring lengths for s from 1 upto n – l + 1 { // s iterates over all possible substring starts

D[s, l] = { }; for k from 1 upto l – 1 { // k iterates over all possible partition positions D[s, l] = D[s, l] U {A | A → BC is a production in G and B is in D[s, k] and C is in D[s+k, l-k]};

} } } if ( S is in D[1, n] ) return true; else return false;}

Page 36: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

How & Why CYK Works

CYK runs in O(n3), where |x| = n > 0 Both k and l-k are strictly less than l If we know that each of the two smaller

derivations exists (i.e. B * xsk and C * x(s+k)(l-k)), we can determine if A BC

When we reach l=n, we can determine if S* x1n

Page 37: Theory of Computation (Fall 2014): Cocke-Younger-Kasami Algorithm

References & Reading Suggestions

Hopcroft and Ullman. Introduction to Automata Theory, Languages, and Computation, Narosa Publishing House

Moll, Arbib, and Kfoury. An Introduction to Formal Language Theory

www.youtube.com/vkedco