cse443 compilers - university at buffalo

28
CSE443 Compilers Dr. Carl Alphonce [email protected] 343 Davis Hall

Upload: others

Post on 01-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSE443 Compilers - University at Buffalo

CSE443 CompilersDr. Carl Alphonce

[email protected] 343 Davis Hall

Page 2: CSE443 Compilers - University at Buffalo

Phases of a

compiler

Figure 1.6, page 5 of text

Syntactic structure

Page 3: CSE443 Compilers - University at Buffalo

ExampleL = { 0, 1, 00, 11, 000, 111, 0000, 1111, … }

G = ( {0,1}, {S, ZeroList, OneList}, {S -> ZeroList | OneList, ZeroList -> 0 | 0 ZeroList, OneList -> 1 | 1 OneList }, S )

Page 4: CSE443 Compilers - University at Buffalo

Derivations from GDerivation of 0 0 0 0 S -> ZeroList -> 0 ZeroList -> 0 0 ZeroList -> 0 0 0 ZeroList -> 0 0 0 0

Derivation of 1 1 1 S -> OneList -> 1 OneList -> 1 1 OneList -> 1 1 1

Page 5: CSE443 Compilers - University at Buffalo

ObservationsEvery string of symbols in a derivation is a sentential form. A sentence is a sentential form that has only terminal symbols. A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded A derivation can be leftmost, rightmost, or neither.

Page 6: CSE443 Compilers - University at Buffalo

Programming Language Grammar Fragment

<program> -> <stmt-list> <stmt-list> -> <stmt> | <stmt> ; <stmt-list> <stmt> -> <var> = <expr> <var> -> a | b | c | d <expr> -> <term> + <term> | <term> - <term> <term> -> <var> | const

Notes: <var> is defined in the grammar const is not defined in the grammar

Page 7: CSE443 Compilers - University at Buffalo

A leftmost derivation of a = b + const

<program> => <stmt-list> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const

Page 8: CSE443 Compilers - University at Buffalo

Parse tree<program>

<stmt-list>

<stmt>

<var> = <expr>

a <term> + <term>

<var> const

b

Page 9: CSE443 Compilers - University at Buffalo

Parse trees and compilation

A compiler builds a parse tree for a program (or for different parts of a program) If the compiler cannot build a well-formed parse tree from a given input, it reports a compilation error The parse tree serves as the basis for semantic interpretation/translation of the program.

Page 10: CSE443 Compilers - University at Buffalo

22

Extended BNF• Optional parts are placed in brackets [ ]<proc_call> -> ident [(<expr_list>)]

• Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> -> <term> (+|-) const

• Repetitions (0 or more) are placed inside braces { }<ident> -> letter {letter|digit}

Page 11: CSE443 Compilers - University at Buffalo

23

Comparison of BNF and EBNF• sample grammar fragment expressed in BNF

<expr> -> <expr> + <term>| <expr> - <term>| <term>

<term> -> <term> * <factor>| <term> / <factor>| <factor>

• same grammar fragment expressed in EBNF<expr> -> <term> {(+ | -) <term>}<term> -> <factor> {(* | /) <factor>}

Page 12: CSE443 Compilers - University at Buffalo

Ambiguity in grammars

A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees. Operator precedence and operator associativity are two examples of ways in which a grammar can provide unambiguous interpretation.

Page 13: CSE443 Compilers - University at Buffalo

Operator precedence ambiguity

The following grammar is ambiguous:

<expr> -> <expr> <op> <expr> | const <op> -> - | /

The grammar treats the two operators, '-' and '/', equivalently

Page 14: CSE443 Compilers - University at Buffalo

26

An ambiguous grammarfor arithmetic expressions

<expr> -> <expr> <op> <expr> | const<op> -> / | -

<expr>

<expr> <expr>

<expr> <expr>

<expr>

<expr> <expr>

<expr> <expr>

<op>

<op>

<op>

<op>

const const const const const const- -/ /

<op>

Page 15: CSE443 Compilers - University at Buffalo

Disambiguating the grammar

This grammar (fragment) is unambiguous:

<expr> -> <expr> - <term> | <term> <term> -> <term> / const | const

The grammar treats the two operators, '-' and '/', differently.

In this grammar, '/' has higher precedence than '-'.

Page 16: CSE443 Compilers - University at Buffalo

28

Disambiguating the grammar

• If we use the parse tree to indicate precedence levels of the operators, we can remove the ambiguity.

• The following rules give / a higher precedence than -

<expr> -> <expr> - <term> | <term><term> -> <term> / const | const

<expr>

<expr> <term>

<term> <term>

const const

const/

-

Page 18: CSE443 Compilers - University at Buffalo

30

Derivation of2+5*3

using C grammar

<expression>

<conditional-expression>

<assignment-expression>

<logical-OR-expression>

<inclusive-OR-expression>

<AND-expression>

<logical-AND-expression>

<exclusive-OR-expression>

<equality-expression>

<relational-expression>

<shift-expression>

<additive-expression>

<additive-expression> + <multiplicative-expression>

<multiplicative-expression>

<cast-expression>

<unary-expression>

<postfix-expression>

<primary-expression>

<constant>

2

<multiplicative-expression> <cast-expression>

<unary-expression>

<postfix-expression>

<primary-expression>

<constant>

3

<cast-expression>

<unary-expression>

<postfix-expression>

<primary-expression>

<constant>

5

*

Page 19: CSE443 Compilers - University at Buffalo

31

Recursion and parentheses

• To generate 2+3*4 or 3*4+2, the parse tree is built so that + is higher in the tree than *.

• To force an addition to be done prior to a multiplication we must use parentheses, as in (2+3)*4.

• Grammar captures this in the recursive case of an expression, as in the following grammar fragment:

<expr> à <expr> + <term> | <term><term> à <term> * <factor> | <factor><factor> à <variable> | <constant> | “(” <expr> “)”

Page 20: CSE443 Compilers - University at Buffalo

Lecture discussion

There are many reasons to study the syntax of programming languages.When learning a new language you need to be able to read a syntax description to be able to write well-formed programs in the language.Understanding at least a little of what a compiler does in translating a program from high-level to low-level forms deepens your understanding of why programming languages are designed the way they are, and equips you to better diagnose subtle bugs in programs.The next slide shows the “evaluation order” remark in the C++ language reference, which alludes to the order being left unspecified to allow a compiler to optimize the code during translation.

32

Page 21: CSE443 Compilers - University at Buffalo

Shown on Visualizer

33

C++ Programming Language, 3rd edition. Bjarne Stroustrup. (c) 1997. Page 122.

Page 22: CSE443 Compilers - University at Buffalo

A compiler translates high level language statements into a much larger number of low-level statements, and then applies optimizations. The entire translation process, including optimizations, must preserve the semantics of the original high-level program.The next slides shows that different phases of compilation can apply different types of optimizations (some target-independent, some target-dependent).By not specifying the order in which subexpressions are evaluated (left-to-right or right-to-left) a C++ compiler can potentially re-order the resulting low-level instructions to give a “better” result.

34

Page 23: CSE443 Compilers - University at Buffalo

35

Com

pilers: principles, techniques, and tools (Aho

et al) (c) 2007, page 5.

Page 24: CSE443 Compilers - University at Buffalo

RL ⊆ CFLGiven a regular language L we can always construct a context free grammar G such that L = 𝓛(G).

For every regular language L there is an NFA M = (S,∑,𝛅,F,s0) such that L = 𝓛(M).

Build G = (N,T,P,S0) as follows:

N = { Ns | s ∈ S }

T = { t | t ∈ ∑ }

If 𝛅(i,a)=j, then add Ni → a Nj to P

If i ∈ F, then add Ni → 𝜀 to P

S0 = Nso

Page 25: CSE443 Compilers - University at Buffalo

(a|b)*abb

G = ( {A0, A1, A2, A3}, {a, b}, {A0 → a A0, A0 → b A0, A0 → a A1, A1 → b A2, A2 → b A3, A3 → 𝜀}, A0 }

0 1 2 3a b b

a

b

Page 26: CSE443 Compilers - University at Buffalo

RL ⊊ CFLShow that not all CF languages are regular.

To do this we only need to demonstrate that there exists a CFL that is not regular.

Consider L = { anbn | n ≥ 1 }

Claim: L ∈ CFL, L ∉ RL

Page 27: CSE443 Compilers - University at Buffalo

RL ⊊ CFLProof (sketch):

L ∈ CFL: S → aSb | ab

L ∉ RL (by contradiction):

Assume L is regular. In this case there exists a DFA D=(S,∑,𝛅,F,s0) such that 𝓛(D) = L.

Let k = |S|. Consider aibi, where i>k.

Suppose 𝛅(s0, ai) = sr. Since i>k, not all of the states between s0 and sr are distinct. Hence, there are v and w, 0 ≤ v < w ≤ k such that sv = sw. In other words, there is a loop.

This DFA can certainly recognize aibi but it can also recognize ajbi, where i ≠ j, by following the loop.

"REGULAR GRAMMARS CANNOT COUNT"

Page 28: CSE443 Compilers - University at Buffalo

Relevance? Nested '{' and '}'

public class Foo { public static void main(String[] args) { for (int i=0; i<args.length; i++) { if (args[I].length() < 3) { … }

else { … } }

} }