cs 461 – oct. 10

12
CS 461 – Oct. 10 • Review PL grammar as needed • How to tell if a word is in a CFL? – Convert to PDA and run it. – CYK algorithm – Modern parsing techniques

Upload: giacomo-chapman

Post on 30-Dec-2015

30 views

Category:

Documents


1 download

DESCRIPTION

CS 461 – Oct. 10. Review PL grammar as needed How to tell if a word is in a CFL? Convert to PDA and run it.  CYK algorithm Modern parsing techniques. Accepting input. How can we tell if a given source file (input stream of tokens) is a valid program? Language defined by CFG, so … - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS 461 – Oct. 10

CS 461 – Oct. 10

• Review PL grammar as needed

• How to tell if a word is in a CFL?– Convert to PDA and run it. – CYK algorithm– Modern parsing techniques

Page 2: CS 461 – Oct. 10

Accepting input

• How can we tell if a given source file (input stream of tokens) is a valid program?Language defined by CFG, so …– Can see if there is some derivation from grammar?– Can convert CFG to PDA?

• Exponential performance not acceptable. (e.g. doubling every time we add token)

• Two improvements:– CYK algorithm, runs in O(n3)– Bottom-up parsing, generally linear, but restrictions on

grammar.

Page 3: CS 461 – Oct. 10

CYK algorithm

• In 1965-67, discovered independently by Cocke, Younger, Kasami.

• Given any CFG and any string, can tell if grammar generates string.

• The grammar needs to be in CNF first.– This ensures that the rules are simple. Rules are of

the form X a or X YZ

• Consider all substrings of len 1 first. See if these are in language. Next try all len 2, len 3, …. up to length n.

Page 4: CS 461 – Oct. 10

continued

• Maintain results in an NxN table. Top right portion not used.– Example on right is for

testing word of length 3.

• Start at bottom; work your way up.

• For length 1, just look for “unit rules” in grammar, e.g. Xa.

1..3

X X1..2 2..3

X1..1 2..2 3..3

Page 5: CS 461 – Oct. 10

continued

• For general case i..j– Think of all possible

ways this string can be broken into 2 pieces.

– Ex. 1..3 = 1..2 + 3..3or 1..1 +

2..3– We want to know if

both pieces L. This handles rules of form A BC.

• Let’s try example from 3+7+. (in CNF)

1..3

X X1..2 2..3

X1..1 2..2 3..3

Page 6: CS 461 – Oct. 10

337 3+7+ ?

S AB

A 3 | AC

B 7 | BD

C 3

D 7

For each len 1 string, which variables generate it?

1..1 is 3. Rules A and C.

2..2 is 3. Rules A and C.

3..3 is 7. Rules B and D.

1..3

X X1..2 2..3

X1..1

A, C

2..2

A, C

3..3

B, D

Page 7: CS 461 – Oct. 10

337 3+7+ ?

S AB

A 3 | AC

B 7 | BD

C 3

D 7

Length 2:

1..2 = 1..1 + 2..2 =

(A or C)(A or C) = rule A

2..3 = 2..2 + 3..3 =

(A or C)(B or D) = rule S

1..3

X X1..2

A

2..3

S X1..1

A, C

2..2

A, C

3..3

B, D

Page 8: CS 461 – Oct. 10

337 3+7+ ?

S AB

A 3 | AC

B 7 | BD

C 3

D 7

Length 3: 2 cases for 1..3:

1..2 + 3..3: (A)(B or D) = S

1..1 + 2..3: (A or C)(S) no!

We only need one case to work.

1..3

S X X1..2

A

2..3

S X1..1

A, C

2..2

A, C

3..3

B, D

Page 9: CS 461 – Oct. 10

CYK example #2

Let’s test the word baabS AB | BCA BA | aB CC | bC AB | a

Length 1:‘a’ generated by A, C‘b’ generated by B

1..4X X X

1..3 2..4X X

1..2 2..3 3..4X

1..1

B

2..2

A, C

3..3

A, C

4..4

B

Page 10: CS 461 – Oct. 10

baab

S AB | BC

A BA | a

B CC | b

C AB | a

Length 2:

1..2 = 1..1 + 2..2 = (B)(A, C) = S,A

2..3 = 2..2 + 3..3 = (A,C)(A,C) = B

3..4 = 3..3 + 3..4 = (A,C)(B) = S,C

1..4X X X

1..3 2..4X X

1..2

S, A

2..3

B

3..4

S, C X1..1

B

2..2

A, C

3..3

A, C

4..4

B

Page 11: CS 461 – Oct. 10

baab

S AB | BC

A BA | a

B CC | b

C AB | a

Length 3: [ each has 2 chances! ]

1..3 = 1..2 + 3..3 = (S,A)(A,C) = Ø

1..3 = 1..1 + 2..3 = (B)(B) = Ø

2..4 = 2..3 + 4..4 = (B)(B) = Ø

2..4 = 2..2 + 3..4 = (A,C)(S,C) = B

1..4X X X

1..3

Ø

2..4

B X X1..2

S, A

2..3

B

3..4

S, C X1..1

B

2..2

A, C

3..3

A, C

4..4

B

Page 12: CS 461 – Oct. 10

Finally…

S AB | BCA BA | aB CC | bC AB | aLength 4 [has 3 chances!]1..4 = 1..3 + 4..4 = (Ø)(B) = Ø1..4 = 1..2 + 3..4 = (S,A)(S,C) = Ø1..4 = 1..1 + 2..4 = (B)(B) = Ø

Ø means we lose!baab L.

However, in general don’t give up if you encounter Ø in the middle of the process.

1..4

Ø X X X1..3

Ø

2..4

B X X1..2

S, A

2..3

B

3..4

S, C X1..1

B

2..2

A, C

3..3

A, C

4..4

B