context-free grammars [section 2.1] - more powerful than regular languages - originally developed by...
TRANSCRIPT
Context-free Grammars[Section 2.1]
- more powerful than regular languages
- originally developed by linguists
- important for compilation of programming languages
Context-free Grammars[Section 2.1]
Example: A -> 0A1
A -> B
B -> #
Terminology:
- substitution rules (productions)
- variables (including the start variable) – typically upper-case
- terminals – typically lower-case, other symbols
- derivation, parse tree
Context-free Grammars[Section 2.1]
Def 2.2: A context-free grammar is a 4-tuple (V,§,R,S), where
- V is a finite set of variables
- § is a finite set of terminals, V Å § = ;
- R is a finite set of rules, each rule is of the form A -> w where A 2 V and w 2 (V [ §)*
- S 2 V is the start variable
If A -> w 2 P, then we write uAv => uwv (read “uAv yields uwv”), and we write u =>* v (read “u derives v”) if u=v or if there exists a sequence u1,u2,…,uk such that
u => u1 => u2 => … uk => v
The language of the grammar is { w2§* | S=>*w }.
Context-free Grammars[Section 2.1]
Examples: give context-free grammars for the following languages:
- { aibjck | i=k, i,j,k ¸ 0 }
- { aibjck | i=j, i,j,k ¸ 0 }
- strings over { ( , ) } that are well-parenthesized
- strings over { 0,1 } that contain equal number of 0’s and 1’s
Ambiguity[Section 2.1]
Example:
[EXPR] ->[EXPR] + [EXPR] | [EXPR] x [EXPR] |
( [EXPR] ) | a
Give a derivation (and parse trees) for the string a+axa.
Notice: for every parse tree there is a unique left-most derivation.
Ambiguity[Section 2.1]
Def 2.7: A context-free grammar is called ambiguous if there exists a string that can be generated by two different left-most derivations.
Note: Some context-free languages do not have unambigous grammars (e.g { aibjck | i=j or j=k } ). These are called inherently ambiguous.
Example: give an unambigous CFG for the language of arithmetic expressions over { +, x, a }
Ambiguity[Section 2.1]
Def 2.7: A context-free grammar is called ambiguous if there exists a string that can be generated by two different left-most derivations.
For every x2\Sigma let y… L = {a^ib^jc^k | i,j,k\geq 0}.
Example: give an unambigous CFG for the language of arithmetic expressions over { +, x, a }
Chomsky Normal Form[Section 2.1]
Def 2.8: A CFG is in Chomsky normal form if every rule is of the form A -> BC or A -> a, where B,C 2 V-{S}, and a 2 §.
Thm 2.9: Any CFL can be generated by a CFG in Chomsky normal form.
Note: what about CFL that contains ε ?
Why a normal form ?
Chomsky Normal Form[Section 2.1]
Thm 2.9: Any CFL can be generated by a CFG in Chomsky normal form.
“Proof” by example:
S -> ASA | aB Things to fix: 1.
A -> B | S 2.
B -> b | ε 3.
4.