Download - PART I SISTEM UTILITIES
PART ISISTEM UTILITIES
Lecture 6
Compilers
Ştefan Stăncescu
1
COMPILERS
2
“high level language” HLL,
w/complex grammar laws,
closer to human language
HLL mean for man computer link
human language binary language
HLL binary language
COMPILER - Automatic translation machine
COMPILERS
3
Source Code =>in HLL language
Object code =>in binary language (machine code)
COMPILATION – cf. HLL grammar law
• lexical laws
language elements type and structure
• syntactic laws
composition rules of language elements
• "semantic" laws (translation programs)
syntactic law correspondent in object code, “semantic programs” for machine
COMPILERS
4
Compiling = review + translate HLL source text
• lexical laws
scanner
• syntactic laws
parser
• "semantic" laws
object code generator
(at the VM – intermediate code - "bytecode“)
COMPILERS
5
SCANER identifies tokens
• language elements -
one or many adjacent single characters separated by characters sp, LF,FF, etc.)
• words START, STOP, LABEL01
• operators +*/-
• special signs(){}//.,
COMPILERS
6
SCANER step I
scanning HLL source text
determine the token list by boundary
identify HLL tokens
identify programmer invented tokens
create look-up table with
numerical symbols for tokens
COMPILERS
7
SCANNER step 2
create intermediate source file
with replaced tokens with numerical symbols from the
look-up table created in step 1
COMPILERS
8
BNF – Bachus-Naur Form
syntactic rule REPRESENTATION
A rule - law in BNF format a valid construction in HLL language
formatted template of
a rule applied in a line in source file
(and a rule applied for lines in a line list)
COMPILERS
9
Syntactic rule valid construction in HLL
A template have the name of
the new built and checked element
that can be part of other construction
(including one with the same pattern)
New build name “nonterminal” symbol
BNF rule form:
<nonterminal symbol > :: = building template
COMPILERS
10
Parsing discovery in HLL source file of
successive valid BNF rules (templates) until
there are no more undiscovered laws
(no more “nonterminal” symbols)
Parsing ends only on tokens (“terminal” symbols)
Chaining BNF rules (templates) => syntax tree
The purpose parsing => the discovery of
the syntax tree of the source file
COMPILATOARE
11
Line in the source file: S = A + B
(A, B, S - integer variables - tokens)
The code generator must explain
to the machine the templates finded
The scanner identifies tokens
“S” “=“ “A” “+” “B”
tokens “A”, “B”, “S” as variables
token “+” operator , token “=“ assign
COMPILATOARE
12
The parser verifies also the coherence of variables, if are the same
(if all A, B, S integers – OK)
if one is different, the templates for “+” and “=“ need conversion to coherent type
Ex: if S is real, A,B integer
“+” rule OK , result integer
“=“ (assignment rule) add
format conversion integer => real(float)
COMPILERS
13
I-st parser operation - structures consistency
(conversion, if needed)
II-nd parser operation - A+B
(result in temporary memory)
III-rd parser operation - assigning result to S
(S=A+B)
Applicable BNF rules:
conversion, addition, assignment, in that order
COMPILERS
14
EXAMPLE II (bottom-up parsing)
S=A+B*C – D
scan the line, discover operations to be performed first
result become “nonterminal” symbol <N>
=> The precedence of operators( + <. * ) | ( * .> -)
Assuming algebraic expression rules
Syntactic algebraic rule of multiplication<product>::=<agent>*<agent>
Syntactic law of addition
<sum> ::=(<agent>+< agent >)|(< agent >-< agent >)
COMPILERS
15
EXEMPLE II (bottom-up parsing)
<N1>::=B*C
<N2>::=A+N1
<N3>::=N2-D
Syntactic tree of expression A+B*C-D
COMPILERS
16
EXEMPLE II (bottom-up parsing)
S=A+(B*C-D)
S=ATTRIB(N3)
N3=SUM(A,N2)
N2=SCAD(N1,D)
N1=PROD(B,C)
Syntactic tree of expression A+B*C-D
COMPILERS
17
STANDARD PROGRAM IN PASCAL SIMPLIFIED LANGUAGE
1 MEDIA ANALYSIS PROGRAM
2 VAR
3 NRCRT, I: INTEGER;
3 SARITM, SARMON, DIF: REAL
4 BEGIN
5 SARITM :=0;
6 SARMON :=0;
7 FOR I :=0 TO 100 DO
8 BEGIN
9 READ (NRCRT);
10 SARITM := SARITM + NRCRT;
11 SARMON := SARMON + 1 DIV NRCRT;
12 END;
13 DIF :=SARITM DIV 100 – 100 DIV SARMON;
14 WRITE (DIF);
15 END.
COMPILERS
18
GRAMMAR (BNF) PASCAL SIMPLIFIED LANGUAGE
1. <prog> ::= PROGRAM <prog-name> VAR <dec-list> BEGIN <stmt-list> END.
2. <prog_name> ::= id
3. <dec_list> ::= <dec> | <dec_list> ; <dec>
4. <dec> ::= <id_list> : <type>
5. <type> ::= INTEGER | REAL
6. <id_list> ::= id | <id_list> , id
7. <stmt_list> ::= <stmt> | <stmst_list> ; <stmt>
8. <stmt> ::= <assign> | <read> | <write> | <for>
9. <assign> ::= id := <exp>
10. <exp> ::= <term> | <exp> + <term> | <exp> - <term>
11. <term> ::= <factor> | <term> * <factor> | <term> DIV <factor>
12. <factor> ::= id | int | (<exp>)
13. <read> ::= READ(id_list)
14. <write> ::= WRITE(id_list)
15. <for> ::= FOR <index_exp> DO <body> ;
16. <index_exp> ::= id:= <exp> TO <exp>
17. <body> ::= <stmt> | BEGIN <stmt_list> END
COMPILERS
19
Token Name CodPROGRAM 1
VAR 2
BEGIN 3
END. 4
END 5
INTEGER 6
REAL 7
READ 8
WRITE 9
FOR 10
TO 11
DO 12
; 13
: 14
, 15
:= 16
+ 17
- 18
DIV 19
( 20
) 21
ID 22INT 23
COMPILERS Fisier elaborat de scaner
20
LINI TOKEN Specificity
1 1
22 ^ STATUS
:
7 10
22 ^ I
16
23 < >1
11
23 < >100
12
COMPILERS
21
STANDARD
9.READ (NRCRT);
BNF:
13. <read> ::=READ(id_list)
6. <id_list> ::=id | <id_list>) ; id
COMPILERS
22
STANDARD
15. DIF :=SARITM DIV 100 – 100 DIV SARMON;
BNF:
9. <assign> ::= id := <exp>
10. <exp> ::= <term> | <exp> - <term>
11. <term> ::= <factor> | <term> DIV <factor>
12. <factor> ::= id | int| (<exp>)
COMPILERS
23
COMPILERS
24
COMPILERS
25
PROGRAM .=. VAR
BEGIN <. FOR
; .> END.
Vide pairs - grammatical errors
Precedence relations– only one
(consistency grammar)
COMPILERS
26
Generating semantic programsDIF := SARITM DIV 100 – 100 DIV SARMON
id1 := id2 DIV int - int DIV id4
id1 := exp1 - exp2
id1 := exp3
DIV SARITM #100 i1
DIV #100 SARMON i2
- i1 i2 i3
:= i4 , DIF
COMPILERS
27
(1) := #0 , SARITM {SARITM:=0}
(2) := #0 , SARMON {SARMON:=0}
(3) := #1 , I {FOR i=1 to 100}
(4) JGT I #100 (15)
(5) CALL X READ {READ(NRCRT)}
(6) PARAM NRCRT
(7) + SARITM NRCRT i1 {SARITM:=SARITM+NRCRT}
(8) := i1 , SARITM
(9) DIV #1 NRCRT i2 {SARMON:=SARMON+1 DIV NRCRT)
(10) + SARMON i2 i3
(11) := i3 , SARMON
(12) + I #1 i4 {sfîrşit FOR}
(13) := i4 , I
(14) J (4)
(15) DIV SARITM #100 i6 {DIF :=SARITM DIV 100 - 100 DIV SARMON}
(16) DIV #100 SARMON i7
(17) - i6 i7 i8
(18) := i8 , DIF
(19) CALL X WRITE
(20) PARAM DIF
COMPILERS
28
1. L.L. Beck, „System Software: An introduction to systems programming”, Addison Wesley. 3’rd edition, 1997.
2. A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, „Compilers: Principles, Techniques, and Tools”, 2'nd Edition. Addison-Wesley, 2007
3. Wirth Niklaus ""Compiler Construction", Addison-Wesley, 1996, 176 pages. Revised November 2005
4. Knuth, Donald E. "Backus Normal Form vs. Backus Naur Form", Communications of the ACM 7 (12), 1964, p735–736.