cs 275 automata and formal language theory course notes

21
CS 275 Automata and Formal Language Theory Course Notes Part II: The Recognition Problem (II) Chapter II.5.: Properties of Context Free Grammars (14) Anton Setzer (Based on a book draft by J. V. Tucker and K. Stephenson) Dept. of Computer Science, Swansea University http://www.cs.swan.ac.uk/csetzer/lectures/ automataFormalLanguage/current/index.html April 29, 2016 CS 275 Chapter II.5. 1/ 84 II.5.1. Derivation Trees for Context-Free Grammars (14.1) II.5.2. Uniqueness of Derivation Trees (14.1) II.5.4. The Pumping Lemma for CFG (14.4) II.5.5. Floyd’s Theorem CS 275 Chapter II.5. 2/ 84 II.5.1. Derivation Trees for Context-Free Grammars (14.1) II.5.1. Derivation Trees for Context-Free Grammars (14.1) II.5.2. Uniqueness of Derivation Trees (14.1) II.5.4. The Pumping Lemma for CFG (14.4) II.5.5. Floyd’s Theorem CS 275 Sect. II.5.1. 3/ 84 II.5.1. Derivation Trees for Context-Free Grammars (14.1) Derivation Trees or Parse Trees I Context free Grammars (abbreviated as ✿✿✿✿✿ CFG in the following) allow to apply to a non-terminal at position without needing the context. I Therefore we can expand the non-terminals independently of each other. I This allows us to define derivation trees (also called parse trees). CS 275 Sect. II.5.1. 4/ 84

Upload: others

Post on 05-Dec-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 275 Automata and Formal Language Theory Course Notes

CS 275 Automata and Formal Language TheoryCourse Notes

Part II: The Recognition Problem (II)Chapter II.5.: Properties of Context Free Grammars (14)

Anton Setzer(Based on a book draft by J. V. Tucker and K. Stephenson)

Dept. of Computer Science, Swansea University

http://www.cs.swan.ac.uk/∼csetzer/lectures/automataFormalLanguage/current/index.html

April 29, 2016

CS 275 Chapter II.5. 1/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

II.5.2. Uniqueness of Derivation Trees (14.1)

II.5.4. The Pumping Lemma for CFG (14.4)

II.5.5. Floyd’s Theorem

CS 275 Chapter II.5. 2/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

II.5.2. Uniqueness of Derivation Trees (14.1)

II.5.4. The Pumping Lemma for CFG (14.4)

II.5.5. Floyd’s Theorem

CS 275 Sect. II.5.1. 3/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Derivation Trees or Parse Trees

I Context free Grammars (abbreviated as:::::CFG in the following) allow

to apply to a non-terminal at position without needing the context.

I Therefore we can expand the non-terminals independently of eachother.

I This allows us to define derivation trees (also called parse trees).

CS 275 Sect. II.5.1. 4/ 84

Page 2: CS 275 Automata and Formal Language Theory Course Notes

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example

Consider the grammar

grammar G

terminals a, b

nonterminals S

start symbol S

productions S −→ aSbS −→ ab

CS 275 Sect. II.5.1. 5/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Derivation

We derive aaabbbb in it:

S ⇒ aSb⇒ aaSbb⇒ aaabbb

CS 275 Sect. II.5.1. 6/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Derivation Tree

S ⇒ aSb ⇒ aaSbb ⇒ aaabbb

ab

ab

a b

S

S

S

CS 275 Sect. II.5.1. 7/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Form of the Derivation Tree

I Nodes are labelled with elements of N ∪ T ∪ {ε}.I A node with label A has a subtree

X1 X2

...Xn

A

only if A is a non-terminal and there is a production

A −→ X1X2 · · ·Xn

where Xi ∈ T ∪ N.I All leaves of the tree together read from left to right form the string

derived, namely aaabbb.This is called the

::::::::frontier of the derivation tree.

I We will as well consider derivation trees not ending in a string ofterminals, so the frontier is an element of (T ∪ N)∗.

CS 275 Sect. II.5.1. 8/ 84

Page 3: CS 275 Automata and Formal Language Theory Course Notes

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Definition Derivation Tree

Definition

Let G = (T ,N,S ,P) be a CFG. A:::::::::::derivation

:::::tree or

::::::parse

:::::tree for G is

a finite tree with

I nodes labelled by elements of N ∪ T ∪ {ε},I s.t. a node A has children with labels X1, . . . ,Xn only if A ∈ N and

there is a productionA −→ X1X2 · · ·Xn

I If the node of one of the children of A is ε, then this node is the onlychild of this tree.

The::::::::frontier of the tree is the set of leaves red from left to right in

sequence, which is an element (T ∪ N)∗.The

:::::root of the tree is the node at the to of the derivation tree.

CS 275 Sect. II.5.1. 9/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Left-Most and Right-Most Derivations

From a derivation tree we can obtain a derivation in various orders.Consider the grammar

grammar G

terminals a, b

nonterminals S ,A,B

start symbol S

productions S −→ AB,A −→ aAa, A −→ aB −→ bBb, B −→ b

CS 275 Sect. II.5.1. 10/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Derivation Tree

S

B

AB

Aa

a

a

b b

b

CS 275 Sect. II.5.1. 11/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Different Derivations of aaabbb

We can derive aaabbb in different ways:

S⇒AB⇒aAaB⇒aaaB ⇒aaabBb⇒aaabbbA

::::left

::::::most

::::::::::::derivation

S⇒AB⇒AbBb⇒Abbb ⇒aAabbb⇒aaabbbA

:::::right

:::::::most

::::::::::::derivation

S⇒AB⇒aAaB⇒aAabBb⇒aaabBb⇒aaabbb

S⇒AB⇒aAaB⇒aAabBb⇒aAabbb⇒aaabbb

S⇒AB⇒AbBb⇒aAabBb⇒aaabBb⇒aaabbb

S⇒AB⇒AbBb⇒aAabBb⇒aAabbb⇒aaabbb

CS 275 Sect. II.5.1. 12/ 84

Page 4: CS 275 Automata and Formal Language Theory Course Notes

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Left-Most Derivation

Definition

Let G = (T ,N,S ,P) be a CFG.A single-step derivation w ⇒ w ′ is

::::::::::left-most if a rule was applied to the

left-most non-terminal in w , i.e.

I w = sAt for some A ∈ N, s ∈ T ∗ (consisting only of terminals),t ∈ (S ∪ T )∗,

I and there exist a production A −→ v

I s.t. w ′ = svt.

CS 275 Sect. II.5.1. 13/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Left-Most Derivation

Definition

Let G = (T ,N,S ,P) be a CFG.A single-step derivation w ⇒ w ′ is

::::::::::::right-most if a rule was applied to the

right-most non-terminal in w , i.e.

I w = sAt for some A ∈ N, s ∈ (S ∪ T )∗, t ∈ T ∗ (consisting only ofterminals),

I there exist a production A −→ v

I s.t. w ′ = svt.

CS 275 Sect. II.5.1. 14/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Left/Right-Most Derivation Sequence

Definition

Let G = (T ,N,S ,P) be a CFG

1. A derivation sequence w0 ⇒ w1 ⇒ w2 ⇒ · · ·wn is left-most, if eachderivation step wi ⇒ wi+1 is left-most.

2. Right-most derivation sequences are defined analogously.

CS 275 Sect. II.5.1. 15/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Theorem II.5.1.1. (Derivation Trees and LanguageGeneration)

Theorem

Let G = (T ,N,S ,P) be a CFG, A ∈ T , w ,w ′ ∈ (T ∪ N)∗, Then thefollowing are equivalent

(1) There exist a derivation tree with root labelled by A andfrontier w ′.

(2) A⇒∗ w ′.

In case w ′ ∈ T ∗, the derivation sequence w ⇒∗ w ′ can both be chosen asa left-most and as a right-most derivation sequence

CS 275 Sect. II.5.1. 16/ 84

Page 5: CS 275 Automata and Formal Language Theory Course Notes

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Proof of Theorem II.5.1.1.

I A proof of this theorem can be found in the additional material.I We illustrate this theorem by an example.

I We will first present a left-most derivation.I Then we will present a right most derivation.

CS 275 Sect. II.5.1. 17/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Left-Most Derivation (Step 1)

S

A

Aa

a

a

b

B

B b

b

S

CS 275 Sect. II.5.1. 18/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Left-Most Derivation (Step 2)

S ⇒ AB

Aa

a

a

b B b

b

S

AB

CS 275 Sect. II.5.1. 19/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Left-Most Derivation (Step 3)

S ⇒ AB ⇒ aAaB

a

b b

b

S

A

aA a

B

B

Derivation tree for a is trivial.CS 275 Sect. II.5.1. 20/ 84

Page 6: CS 275 Automata and Formal Language Theory Course Notes

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Left-Most Derivation (Step 4)

S ⇒ AB ⇒ aAaB ⇒ aaaB

b B b

b

S

A

Aa

a

a

B

CS 275 Sect. II.5.1. 21/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Left-Most Derivation (Step 5)

S ⇒ AB ⇒ aAaB ⇒ aaaB ⇒ aaabBb

b

S

A

A

B

b B ba

a

a

CS 275 Sect. II.5.1. 22/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Left-Most Derivation (Step 6)

S ⇒ AB ⇒ aAaB ⇒ aaaB ⇒ aaabBb ⇒ aaabbb

S

A

A

B

Bba

a

a

b

b

Final derivation.CS 275 Sect. II.5.1. 23/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Right-Most Derivation (Step 1)

S

A

aA

a

a b

B

B

b

b

S

CS 275 Sect. II.5.1. 24/ 84

Page 7: CS 275 Automata and Formal Language Theory Course Notes

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Right-Most Derivation (Step 2)

S ⇒ AB

aA

a

a b B

b

b

S

AB

CS 275 Sect. II.5.1. 25/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Right-Most Derivation (Step 3)

S ⇒ AB ⇒ AbBb

aA

a

a

b

S

AB

b B b

Derivation tree for a is trivial.CS 275 Sect. II.5.1. 26/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Right-Most Derivation (Step 4)

S ⇒ AB ⇒ AbBb ⇒ Abbb

aA

a

a

S

AB

b B b

b

CS 275 Sect. II.5.1. 27/ 84

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Right-Most Derivation (Step 5)

S ⇒ AB ⇒ AbBb ⇒ Abbb ⇒ aAabbb

a

S

AB

b B b

b

aA a

CS 275 Sect. II.5.1. 28/ 84

Page 8: CS 275 Automata and Formal Language Theory Course Notes

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

Example Right-Most Derivation (Step 6)

S ⇒ AB ⇒ AbBb ⇒ Abbb ⇒ aAabbb ⇒ aaabbb

S

AB

b B b

b

aA a

a

Final derivation.CS 275 Sect. II.5.1. 29/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

II.5.2. Uniqueness of Derivation Trees (14.1)

II.5.4. The Pumping Lemma for CFG (14.4)

II.5.5. Floyd’s Theorem

CS 275 Sect. II.5.2. 30/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Theorem II.5.2.4 Uniqueness of Derivation (Trees)

Theorem (II.5.2.4)

Let G = (T ,N,S ,P) be a CFG. The following are equivalent:

(1) For every w ∈ T ∗ there exist at most one derivation treewith label S and frontier w.

(2) For every w ∈ T ∗ there exist at most one left-mostderivation sequence S ⇒∗ w.

(3) For every w ∈ T ∗ there exist at most one right-mostderivation sequence S ⇒∗ w.

Proof: See Additional Material.

CS 275 Sect. II.5.2. 31/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Ambiguous Grammars

Definition

A CFG G = (T ,N,S ,P) is:::::::::::::ambiguous, if there is a string w ∈ L(G )

having more than one derivation tree (or, equivalently, having more thanone left-most or more than one right-most derivation).

CS 275 Sect. II.5.2. 32/ 84

Page 9: CS 275 Automata and Formal Language Theory Course Notes

II.5.2. Uniqueness of Derivation Trees (14.1)

Example 1

grammar G

terminals S

nonterminals a, bstart symbol S

productions S −→ aSS −→ bS −→ ab

CS 275 Sect. II.5.2. 33/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Example 1

There are two left-most derivations of ab:

S ⇒ aS ⇒ ab and S ⇒ ab

And two derivation trees:

S

a S

S

a

b

b

CS 275 Sect. II.5.2. 34/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Example 2: Dangling Else

Assume the following grammar which is a cut down version of thegrammar G while introduced in I.2.4 with if then else fi replaced byif then and a if then else ):

grammar G Dangling else

import G Identifier ,G Arithmetic Expression,G Boolean Expression

terminals if, then, else, :=

nonterminals Program

start symbol Program

productions Program−→Id := AExpProgram−→if BExp then Program else ProgramProgram−→if BExp then Program

“import” means that we add all the ingredients of the grammarsmentioned, including the terminalsThe grammars G Identifier ,G Arithmetic Expression,G Boolean Expression have startsymbols Id , AExp, BExp, respectively.

CS 275 Sect. II.5.2. 35/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Example 2: Dangling Else

Assume strings b1, b2 deriving from BExp and string s1, s2 deriving fromProgram.The string

if b1 then if b2 then s1 else s2

has two derivation trees (we omit the derivation trees for bi , si .)

CS 275 Sect. II.5.2. 36/ 84

Page 10: CS 275 Automata and Formal Language Theory Course Notes

II.5.2. Uniqueness of Derivation Trees (14.1)

First Derivation Tree

Program

Programif thenBExp

if BExp then Programb1

b2 s1

Program

s2

else

CS 275 Sect. II.5.2. 37/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Second Derivation Tree

Programif thenBExp

if BExp then Programb1

b2 s1

Program

s2

else

Program

CS 275 Sect. II.5.2. 38/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Different Interpretations of the Program

The two different derivation trees of the program

if b1 then if b2 then s1 else s2

correspond to two different ways of executing the program:

CS 275 Sect. II.5.2. 39/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Execution following the Derivation Tree 1

I In the first the else case belongs to the second if. It is executed if b1

is true and b2 is false.The program can be using suggestive indentation be written asfollows:

if b1 thenif b2 then

s1else

s2

CS 275 Sect. II.5.2. 40/ 84

Page 11: CS 275 Automata and Formal Language Theory Course Notes

II.5.2. Uniqueness of Derivation Trees (14.1)

Execution following the Derivation Tree 2

I In the second derivation tree, the else case belongs to the first if. It isexecuted if b1 is false.The program can be using suggestive indentation be written asfollows:

if b1 thenif b2 then

s1else

s2

CS 275 Sect. II.5.2. 41/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

2 Solutions for Solving the Problem

There are 2 solutions for solving this problem.The first solution is to add to if then and if then else a symbol fi (orsome other keyword such as endif) labelling the end of the statement.

grammar G Unambiguous if

import G Identifier ,G Arithmetic Expression,G Boolean Expression

terminals if, then, else, :=

nonterminals Program

start symbol Program

productions Program−→Id := AExpProgram−→if BExp then Program else Program fiProgram−→if BExp then Program fi

CS 275 Sect. II.5.2. 42/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Solution 1

Now the two interpretations of the original string would be written in astwo different strings:

I “Else” belonging to the second “if” is written as

if b1 then if b2 then s1else s2 fi fi

I “Else” belong to the first “if” is written as

if b1 then if b2 then s1 fi else s2 fi

I This solution has been taken for instance in Algol, in the bash shell(Linux), and in Ada (where fi is replaced by “end if”).

I A similar solution was taken in Java and some other languages:they require that the subprograms of an if then or if then else areenclosed by brackets {· · · }.

CS 275 Sect. II.5.2. 43/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Solution 2

I The 2nd solution is to modify the grammar so that the derivation treewill be possible only for one of the two choices.

I For this we modify the grammar so that the statement s1 in

if b1 then s1 else s2

is not matched byif b′1 then s ′1

but only byif b′1 then s ′1 else s ′2

I This solution has been taken in most other programming languages.

CS 275 Sect. II.5.2. 44/ 84

Page 12: CS 275 Automata and Formal Language Theory Course Notes

II.5.2. Uniqueness of Derivation Trees (14.1)

Solution 2

I For this we split Programs into two categories:I Those derived from MatchedIf. In a program deriving from MatchedIf,

each if is matched by an else clause.I Those derived from UnmatchedIf. These have at least one if with no

matching else clause.

CS 275 Sect. II.5.2. 45/ 84

Solution 2

I The grammar will make sure that a else will always be associatedwith the first if to the left, which has no unmatched else yet.

I So if then mathbf else expression will be parsed as in the firstderivation tree.

Solution 2

Here is the grammar:

grammar G Dangling Else

import G Identifier ,G Arithmetic Expression,G Boolean Expression

terminals if, then, else, :=

nonterminals Program

start symbol Program

productions Program −→ UnmatchedIfProgram −→ MatchedIfMatchedIf −→ Id := AExpMatchedIf −→ if BExp then MatchedIf

else MatchedIfUnmatchedIf −→ if BExp then ProgramUnmatchedIf −→ if BExp then MatchedIf

else UnmatchedIf

II.5.2. Uniqueness of Derivation Trees (14.1)

Unique Derivation Tree 2nd Solution

if thenBExp

b1

UnmatchedIf

Program

Program

if BExp then

b2 s1

MatchedIf

else

s2

MatchedIfMatchedIf

CS 275 Sect. II.5.2. 48/ 84

Page 13: CS 275 Automata and Formal Language Theory Course Notes

II.5.2. Uniqueness of Derivation Trees (14.1)

Failure of Starting Derivation Tree with MatchedIf

Trying to construct derivation tree for expression which matches else tofirst if fails:

if BExp

b1

Program

MatchedIf

MatchedIf

s2

elsethen MatchedIf

if BExp

b2

then

s1

else MatchedIfMatchedIf

Failure to construct derivation treeWould require 2nd else

CS 275 Sect. II.5.2. 49/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Example: Grammar for Arithmetic Expressions

Remember the grammar for arithmetic expressions(using elements of BNF notation)

grammar G Arithmetic Expression

import G Identifier ,G Number

terminals +,−, ∗, /, (, )nonterminals AExp,AOp

start symbol AExp

productions AExp −→ Id | NumberAExp −→ ( AExp )AExp −→ AExp AOp AExpAOp −→ + | − | ∗ | /

CS 275 Sect. II.5.2. 50/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

First Parse tree for 2 + 3 ∗ 4

AExp

+

AExpAOpAExp

AExp

AExpAOp

Number Number

Number

2 3

4

CS 275 Sect. II.5.2. 51/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Second Parse tree for 2 + 3 ∗ 4

AExp

AOpAExp

+Number

2

AExp

AExpAOpAExp

Number Number

3 4

CS 275 Sect. II.5.2. 52/ 84

Page 14: CS 275 Automata and Formal Language Theory Course Notes

II.5.2. Uniqueness of Derivation Trees (14.1)

Difference in Evaluation

I The first parse tree corresponds to parsing it as if it were (2 + 3) ∗ 4Evaluation will return 20.

I The second parse tree corresponds to parsing it as if it were 2 + (3 ∗ 4)Evaluation will return 14.

CS 275 Sect. II.5.2. 53/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Unambiguous Version

grammar G Arithmetic Expressionunambiguous

import G Identifier ,G Number

terminals +,−, ∗, /, (, )nonterminals AExp,Term,Factor

start symbol AExp

productions AExp −→ AExp + Term | AExp − Term | TermTerm −→ Term ∗ Factor |Term/Factor | FactorFactor −→ Id | Number | ( AExp )

CS 275 Sect. II.5.2. 54/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Unique Parse Tree for 2 + 3 ∗ 4 in GArithmetic Expressionunambiguous

AExp

AExp

4

Term

+

Factor

Number

2

Term

Term

Number

Factor

3

∗ Factor

Number

CS 275 Sect. II.5.2. 55/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

Failure to Parse 2 + 3 ∗ 4 with ∗ Binding Stronger Than +

Trying to construct derivation tree for expression which parses 2 + 3 ∗ 4 as2 + 3 ∗ 4 fails:

AExp

∗Factor

Number

FailureDoesn’t allow

4

Term

a + without brackets

Term

CS 275 Sect. II.5.2. 56/ 84

Page 15: CS 275 Automata and Formal Language Theory Course Notes

II.5.2. Uniqueness of Derivation Trees (14.1)

Making Context Free Grammars Unambiguous

The following is known about Context Free Grammars:I There are languages defined by context free grammars which cannot

be defined by an unambiguous grammar.I Context free grammars, for which there exist no equivalent

unambiguous grammar, are called

::::::::::inherently

:::::::::::::ambiguous

:::::::::::grammars.

See Hopcroft/Motwani/Ullman, 5.4.4, p. 213.

I It is undecidable whether a grammar is ambiguous. (Same book,7.4.5, p. 307.)

I It is undecidable whether a grammar is inherently ambiguousgrammars. (Same book, 7.4.5 and 9.5.2, p. 413).

CS 275 Sect. II.5.2. 57/ 84

II.5.2. Uniqueness of Derivation Trees (14.1)

II.5.2. Normal Forms for Context-Free Grammars (14.2)

This section has been moved to “Additional Material”.

CS 275 Sect. II.5.2. 58/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

II.5.2. Uniqueness of Derivation Trees (14.1)

II.5.4. The Pumping Lemma for CFG (14.4)

II.5.5. Floyd’s Theorem

CS 275 Sect. II.5.4 59/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Idea of the Pumping Lemma for CFG

I The Pumping Lemma for Regular Languages is based on thefact that if a string derived by a finite state automaton is sufficientlybig, we pass through at least one state twice.

I We could equivalently have used the fact that in a left- orright-linear grammar, if a string derived is sufficiently big,we pass through one non-terminal at least twice.

I For CFG, we need that if a string derived is sufficiently big, we passthrough one non-terminal at least twice.

I However, if this occurrence is in two disjoint subtrees of thederivation trees, there is no relation between them.

I What we need that there is a path in the subtree whichpasses through one non-terminal at least twice.

CS 275 Sect. II.5.4 60/ 84

Page 16: CS 275 Automata and Formal Language Theory Course Notes

II.5.4. The Pumping Lemma for CFG (14.4)

Picture

S D

Di

Dj

A

A

wu yv x

CS 275 Sect. II.5.4 61/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Idea of the Pumping Lemma for CFG

I If in a derivation tree of a CFG with l non-terminals there isno repetition of a non-terminal in any path, then thetree can have height at most l .

I Here the height of a tree consisting of the root only is defined as 0.I Since at any node of a derivation tree there are only finitely many

rules to apply to, one can easily see that there are onlyfinitely many derivation trees in a CFG with height at most l + 1and arbitrary non-terminal as root.

I Let k ′: be the maximum length of any string derived from anynon-terminal with height at most l + 1. Let k: := k ′ + 1.

I Assume a derivation of a string z with |z | ≥ k .I We can omit in the derivation any subderivations where A⇒∗ A and

this derivation takes more than one step.I The derivation must have height ≥ l + 1, and therefore contain a

subderivation of height exactly l + 1 of a stringfrom some non-terminal.

CS 275 Sect. II.5.4 62/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Picture

S D

Di

Dj

A

A

wu yv x

CS 275 Sect. II.5.4 63/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Idea of the Pumping Lemma for CFG

I In this subderivation there is a path from the root to a terminal, inwhich at least one non-terminal A occurs twice.

I And we can have that subtree starting with upper occurrence of Ahas height ≤ l + 1.

I Therefore the string derived from that A has length at most k ′ < k .I Let

I w be the string deriving from the lower A,I vwx be the string deriving from the upper A, with v and x deriving to

the left and right of the lower A,I z = uvwxy , with u, y , deriving to the left and right of the upper A,

I Then |vwx | ≤ k ′ < k .

CS 275 Sect. II.5.4 64/ 84

Page 17: CS 275 Automata and Formal Language Theory Course Notes

II.5.4. The Pumping Lemma for CFG (14.4)

Picture

S D

Di

Dj

A

A

wu yv x

CS 275 Sect. II.5.4 65/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Idea of the Pumping Lemma for CFG

So for any CFG G we can find a constant k s.t.

I in any derivation tree of a word z ∈ L(G ) s.t. |z | ≥ k,I we can

I decompose z = uvwxyI find a nonterminal A,I and derivations

I S ⇒∗ uAy (written blue on the next slide),I A ⇒∗ vAx (written green on the next slide),I A ⇒∗ w (written red on the next slide)

I The subderivation A⇒∗ vAx plays the role of the loop we had in thepumping lemma for regular languages.

I Furthermore the middle part vwx can be chosen to be of length ≤ k.I vx 6= ε since we omited subderivations A⇒∗ A taking more than one

step.

I The following pictures don’t come out well on the black andwhite handouts. Please look at them on the online version.

CS 275 Sect. II.5.4 66/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Picture

u v w x y

S D

Di

Dj

A

A

CS 275 Sect. II.5.4 67/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Idea of the Pumping Lemma for CFG

I Now we can repeat the subderivation A⇒∗ vAx several times andobtain

S ⇒∗ uAy ⇒∗ uvAxy ⇒∗ uvvAxxy ⇒∗ · · · ⇒∗ uv i Ax i y ⇒∗ uv i vx i y

I And therefore we obtain that for all i ≥ 0 we have uv i vx i y ∈ L(G )

CS 275 Sect. II.5.4 68/ 84

Page 18: CS 275 Automata and Formal Language Theory Course Notes

II.5.4. The Pumping Lemma for CFG (14.4)

Pumping it up to uv 3vx3y

u y

S

wv

v

v x

x

x

A

A

A

A

CS 275 Sect. II.5.4 69/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Pumping it down to uv 0vx0yy

u y

S

A

w

CS 275 Sect. II.5.4 70/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Pumping Lemma for CFG

Theorem

Let L be a context free language. Then there exists a constant k s.t. forall strings z of L s.t. |z | ≥ k there exist u, v ,w , x , y s.t.

I z = uvwxy,

I |vwx | ≤ k, i.e. the middle portion is not too long,

I |vx | ≥ 1, i.e. v or x are not ε,

I ∀i ≥ 0.uv i wx i y ∈ L.

CS 275 Sect. II.5.4 71/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Proof of the Pumping Lemma for CFG

A formal proof can be found in the Additional Material

CS 275 Sect. II.5.4 72/ 84

Page 19: CS 275 Automata and Formal Language Theory Course Notes

II.5.4. The Pumping Lemma for CFG (14.4)

Example 1

Lemma

The language L = {ai bi c i | i ≥ 0} is not context free.

CS 275 Sect. II.5.4 73/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Proof (Example 1)

I Assume L is context free.

I Let k be the constant from the pumping lemma.

I Let z := ak bk ck ∈ L.

I By the pumping lemma, z = uvwxy s.t. |vwx | ≤ k, |vx | ≥ 1 and∀i ≥ 0.uv i wx i y ∈ L.

I If v contains a’s and b’s or b’s and c ’s, uv2wx2y is not an element ofa∗b∗a∗b∗ (i.e. the language defined by this regular expression), sincethere is an a after a b or a b after a c .

I Therefore v is part of ak , bk or ck , similarly for x .

I But now uv2wx2y = ak+i bk+j ck+l where at most 2 of (i , j , l) can be6= 0, and at least one is 6= 0.But then ak+i bk+j ck+l 6∈ L, a contradiction.

CS 275 Sect. II.5.4 74/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Example 2

Lemma

The language L = {anbmanbm | n,m ≥ 0} is not context free.

CS 275 Sect. II.5.4 75/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Proof (Example 2)

I Assume L is context free.

I Let k be the constant from the pumping lemma.

I Let z := ak bk ak bk ∈ L.

I By the pumping lemma, z = uvwxy s.t. |vwx | ≤ k, |vx | ≥ 1 and∀i ≥ 0.uv i wx i y ∈ L.

I If v contains both a’s and b’s uv2wx2y is not an element ofa∗b∗a∗b∗, since there are more than 3 switches between as and bs.

I Therefore v is part of one of the subwords ak , bk , similarly for x .I But now uv2wx2y = ak+i bk+j ak+l bk+m where

I at most 2 of (i , j , l ,m) can be 6= 0,I if there are two they are consecutive,I at least one is 6= 0.

But then ak+i bk+j ak+l bk+m 6∈ L, a contradiction.

CS 275 Sect. II.5.4 76/ 84

Page 20: CS 275 Automata and Formal Language Theory Course Notes

II.5.4. The Pumping Lemma for CFG (14.4)

Example 3

Lemma

The language L = {ww | w ∈ {a, b}∗} is not context free.

CS 275 Sect. II.5.4 77/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Proof (Example 3)

I We use the fact that the intersection of a context free and a regulargrammar is context free.

I This fact is not shown in this module.I It can be shown using the equivalence of context free languages and

languages definable by Push Down Automata.

I If L were context free, so were L′ := L ∩ (a∗b∗a∗b∗).

I But L′ is just the language of Example 2, which is not context free,a contradiction.

CS 275 Sect. II.5.4 78/ 84

II.5.4. The Pumping Lemma for CFG (14.4)

Intersection of CFG

I In Example 3 we used the fact that the intersection of a context freeand a regular language is context free.

I The intersection of two context free languages is in general notcontext free:Consider

L1 := {anbncm | n,m ∈ N}L2 := {anbmcm | n,m ∈ N}

Both L1, L2 are context free.However

L1 ∩ L2 = {anbncn | n ∈ N}

which is tha language of Example 1 which is not context free.

CS 275 Sect. II.5.4 79/ 84

II.5.5. Floyd’s Theorem

II.5.1. Derivation Trees for Context-Free Grammars (14.1)

II.5.2. Uniqueness of Derivation Trees (14.1)

II.5.4. The Pumping Lemma for CFG (14.4)

II.5.5. Floyd’s Theorem

CS 275 Sect. II.5.5 80/ 84

Page 21: CS 275 Automata and Formal Language Theory Course Notes

II.5.5. Floyd’s Theorem

Repetition of words is not context free

I We have seen in Example 3 of II.5.4 (using Pumping Lemma forContext Free Grammars), that

L := {ww | w ∈ {a, b}∗}

is not context free.I Note that

{wwR | w ∈ {a, b}∗}

is context free.

I A program language which expresses that a variable needs to bedeclared before contains as a sublanguage L.

I More precisely, if we had a context free grammar for such a language,we could derive from it a context free grammar for L.

I This can be generalised to Floyd’s theorem.

CS 275 Sect. II.5.5 81/ 84

II.5.5. Floyd’s Theorem

II.5.5. Floyd’s Theorem

Theorem

Under weak assumptions a programming language, which requires thatvariables need to be declared before used, cannot be defined by a contextfree grammar.

A precise formulation and proof of Floyd’s theorem can be found in“Additional Material”.

CS 275 Sect. II.5.5 82/ 84

II.5.5. Floyd’s Theorem

II.5.5. Floyd’s Theorem

I Therefore most programming languages cannot be defined by acontext free grammar.

I However, one can define in most cases a context free grammardefining the basic syntax of a language.

I Grammar allows to define a parse tree.I Languages which are defined by this grammar are those which can be

parsed in such a way.

I Then one adds a program, which afterwards checks semanticproperties of the program,

I E.g. that a variable is declared before being used.I Or even more complicated features such as correctness of type

checking.

I Full details can be found in the “Additional Material”.

CS 275 Sect. II.5.5 83/ 84

II.5.5. Floyd’s Theorem

Chapter II.6.: Push Down Automata

This Chapter will not be taught this year.

CS 275 Sect. II.5.5 84/ 84