unit 2
DESCRIPTION
Macro processor, Compilers and InterpretersTRANSCRIPT
Macro processor, Compliers and Interpreters
� Macro definition and call, macro expansion, Machine Independent
macro processor features,
� Nested macro calls, advanced macro facilities, Design of macro
preprocessor.preprocessor.
� Basic compliers function, Phases of compilation, memory allocation,
compilation of expression,
� compilation of expressions, compilation of control structures, code of
optimization, interpreter.
� Macro are used to provide a program generation facility through
macro expansion.
� Definition :- A macro is a unit of specification for program
generation through expansion.
� A macro consists of a name, set of formal parameter and body of � A macro consists of a name, set of formal parameter and body of
code.
� The use of a macro name with a set of actual parameters is replaced
by some code generated from its body. This is called macro
expansion
� Examples of Macro
� If we have to Re-writing a program then with the help of macros we can
write((.
Start of definitionStart of definitionStart of definitionStart of definition MacroMacroMacroMacro
Macro name My macro
Macro body ADD AREG,XADD BREG, X
End of the macro definition
MEND
write((.
� Original program program with macro
ADD AREG,XADD BREG, X
ADD AREG,XADD BREG, X
ADD AREG,XADD BREG, X
My macro
My macro
My macro
� A macro processor takes a source with macro definition and macro
calls and replace each with its body.
Target codeTarget codeTarget codeTarget code
ADD AREG,XADD BREG, X
Source programSource programSource programSource program
MACROMymacro
ADD AREG,X
ADD AREG,XADD BREG, X
ADD AREG,XADD BREG, X
Macro Macro Macro Macro ProcessorProcessorProcessorProcessor
ADD AREG,XADD BREG, X
MEND
mymacro
mymacro
mymacro
Input
output
� Two kinds of expansion
1. Lexical Expansion
2. Semantic Expansion
Lexical Expansion :- Lexical Expansion implies replacement of
character string by another character string during program character string by another character string during program
generation .Lexical expansion employed to replace occurrences of
formal parameters by corresponding actual parameters.
� Semantic expansion : Semantic expansion implies generation of
instructions tailored to the requirements of specific usage. For
Examples Generation of type specific instructions for manipulation
of byte and words operands.
� A macro is a set of tasks combined together so that you can run or
replay the entire task together with a single command. Macros are a
powerful productivity tool.
� With macros you can perform long or boring tasks just by a single
click.click.
�Macro allows a sequence of source language code to be defined once
and then referred to by name each time it is to be referred..
� A macro definition is enclosed between a macro header statement
and macro end statement. Macro definition are typically located at
the start of a program. A macro definition consists of
� A macro prototype statement
� One or more model statements� One or more model statements
� Macro preprocessor statements
� A macro prototype statement:- The Macro protoype statement
declares the name of macro and kinds of its parameters.
� One or more model statements: A model statement is a statement
from which an assembly language statement may be generated
during macro expansion. during macro expansion.
� Macro preprocessor statements: A preprocessor statement is used to
perform auxiliary functions during macro expansion.
� Macros are typically defined at a start of a program. A macro
definition
consists of
(1) MACRO pseudo opcode.
(2) MACRO name.
(3) Sequence of statements to be abbreviated.
(4) MEND pseudo opcode terminating macro definition.
The structure of a macro is:The structure of a macro is:
%MACRO macro_name;
<macro_text>
%MEND <macro_name>;
Example:
MACRO
INCR &ARG
ADD AREG ,& ARG
ADD BREG ,& ARG
ADD CREG ,& ARG
MEND
� A macro is called by writing macro name with actual parameters in
an assembly program.
The macro call has following syntax:
< macro name > [ < list of parameters > ]
For example,For example,
INCR X
Will call the macro INCR with the actual parameter X.
� A macro call leads to macro expansion .
� During macro expansion, the macro call statement is replaced by a
sequence of assembly statements.
� To differentiate between the original statements of program and the To differentiate between the original statements of program and the
statement resulting from macro expansion ,each expanded statement
is marked with a ‘+’ preceding its label filed.
� Two key notions concerning macro expansion are:
� 1.Expansion time control flow
� 2.Lexical substitution
� 1.Expansion time control flow:- This determine the order in which model
statement are visited during macro expansion
� The flow of control during macro expansion is implemented using a macro
expansion counter(MEC).
� Algorithm: (outline of macro expansion)
1. MEC: statement number of first statement following the prototype statement
2. While statement pointed by MEC is not a MEND statement 2. While statement pointed by MEC is not a MEND statement
(a) If a model statement then
(i) Expand the statement
(ii) MEC:= MEC+1;
(b) Else (i.e a preprocessors statement )
(i) MEC:= new value specified in the statement ;
3. Exit from macro expansion .
� 2.Lexical substitution :- Lexical substitution is used to generate an
assembly statement from a model statement.
� A model statement consists of 3 types of string
(1) An ordinary string, which stands for itself
(2) The name of Formal parameter which is preceded (2) The name of Formal parameter which is preceded
by the character ‘&’.
(3) The name of a preprocessor variable , which is
also preceded by the character ‘&’
�
� A macro may be defined with:
(1) Positional parameters.
(2) Keyword parameters.
Positional Parameters: A positional parameter is written as & parameter_name.
For example , INCR &VARIABLE.
Keyword Parameters: During call to a micro , a keyword parameter is specified Keyword Parameters: During call to a micro , a keyword parameter is specified
by its name.
It takes following form:
< parameter name > = < parameter value >
DEFINATION:-
A model statement in a macro may constitute a call on
another macro i.e. macro call within a macro. Such calls are known
as “Nested Macro Calls”.
� Macro containing the nested call is known as “Outer Macro". A
called macro is known as “Inner macro”.
� Expansion of nested macro calls follows “last-in- first-out(LIFO)
rules". Thus inner structure of nested macro calls ,the expansion of
latest macro call is completed first.latest macro call is completed first.
� `
� Example:-Consider program segment :-
� MACRO
� COMPUTE &ARG
� MOVER AREG ,&ARG
� ADD AREG ,=‘1’
� MOVEN AREG,&ARG
� MEND
� MACRO
� COMPUTE &ARG1,ARG2,ARG3� COMPUTE &ARG1,ARG2,ARG3
� COMPUTE &ARG1
� COMPUTE &ARG2
� COMPUTE &ARG3
� MEND
� The definition of macros “COMPUTE1” contains three separate calls to a define macro
“ COMPUTE”.
� Such macros are expanded on multiple levels.
� Expansion of “COMPUTE1 x,y,z” as follows:-
Source line expanded source expanded source (level 2)
(level 1)
� COMPUTE1 X,Y,Z COMPUTE X MOVER AREG,X
� COMPUTE Y ADD AREG,=‘1’
� COMPUTE Z MOVEM AREG ,X
�
� MOVER AREG,Y� MOVER AREG,Y
� ADD AREG,=‘1’
� MOVEM AREG ,Y
�
� MOVER AREG,Z
� ADD AREG,=‘1’
� MOVEM AREG ,Z
� Advanced macro facilities are aimed at supporting semantic expansion. These facilities can
be grouped into :
1) Facilities for alteration of flow of control during expansion.
2) Expansion time variables.
3) Attributes of parameters.
Two features are provided to facilitate alteration of flow of control
during expansion:-
1)Expansion time sequencing symbols.
2)Expansion time statements AIF,AGO and ANOP.
� AIF is similar to IF statement,the label used for branching is known � AIF is similar to IF statement,the label used for branching is known
as a sequencing symbol.
� AGO is similar to GOTO statement.
Sequencing symbol has the syntax:
<ordinary string>
� An AIF statement has the syntax:
� AIF(<expression>)<sequencing symbol>
� where <expression> is relational expression involving ordinary
string formal parameters etc.
� AIF is conditional branch Pseudo-opcode.AIF is conditional branch Pseudo-opcode.
� An AGO statement has the syntax:
� AGO<sequencing symbol>
� AGO is unconditional branch Pseudo-opcode
� where transfer of expansion time control to the statement containing
<sequencing symbol> in its label field takes place unconditionally.
�
� An ANOP statement has the syntax
<sequencing symbol> ANOP
It simply has the effect of defining the sequencing symbol.
� Expansion of time variables(EV) are variables which can only be use
during the expansion of macro calls.
� A local EV is created for use only during the a particular macro call .
� Its syntax is:-
LCL<&variable name>[,< variable name>(.]
� A global EV exist across all macro calls situated in a program & can be � A global EV exist across all macro calls situated in a program & can be
used in any macro which has a declaration for it.
� Its syntax is:
GBL<&variable name>[,< variable name>(.]
� Expansion of time variables(EV) can be manipulated through the statement
SET .
Syntax:-
< Expansion time variables>SET< expression>
� Example:-
� MACRO
� CONSTANTS
� LCL &A
� &A SET 1
� DB &A
� &A SET &A+1&A SET &A+1
� DB &A
� MEND
� A call on macro CONSTANTS is explained as follows:
� The local EV A is created.The first SET statement assigns the value ‘1’ to it .The first DB
statement thus declares a byte constant ‘1’. The second SET statement assigns the value
� ‘2’ to A and the second DB statement declares a constant ‘2’.
� An attribute is written using the syntax:-
� <attribute name>’<formal parameter spec>
and represents information about the value of the formal parameter i.e.
about the corresponding actual parameter. The type,length and size
attributes have the names T,L and S.attributes have the names T,L and S.
� Example:-
� MACRO
� DCL_CONST &A
� AIF (L’ &A EQ 1) .NEXT
� _ _
� .NEXT _ _� .NEXT _ _
� _ _
� MEND
� Here expansion time control is transferred to the statement having.NEXT
in its label field only if the actual parameter corresponding to the formal
parameter corresponding to the formal parameters A has the length of ‘1’.
� The macro preprocessor accepts an assembly program containing
definitions and calls and translate it into an assembly program which
does not contain any macro definitions or calls.
� The program from output by macro preprocessor can now be handed
over to an assembler to obtain the target language from of the over to an assembler to obtain the target language from of the
program.
� Thus the macro preprocessor segregates macro expansion from the
process of program assembly.
� It is economical because
� We begin the design by listing all tasks involved in macro
expansion.
1. Identify macro calls in the program.
2. Determine the values of formal parameters.
3. Maintain the values of expansion time variables declared in macros.3. Maintain the values of expansion time variables declared in macros.
4. Organize expansion time control flow.
5. Determine the values of sequencing symbols.
6. Perform expansion of model statement.
1. Identify the information necessary to perform a task.
2. Design a suitable data structure to record the information.
3. Determine the processing necessary to obtain the information.
4. Determine the processing necessary to perform the task. 4. Determine the processing necessary to perform the task.
� A compiler is a program takes a program written in a source language and
translates it into an equivalent program in a target language.
Source program COMPILER target program
error messages
33
( Normally a program written in
a high-level programming language)
( Normally the equivalent program in
machine code – relocatable object file)
� In addition to the development of a compiler, the techniques used in compiler design can be
applicable to many problems in computer science.
◦ Techniques used in a lexical analyzer can be used in text editors, information retrieval
system, and pattern recognition programs.
◦ Techniques used in a parser can be used in a query processing system such as SQL.
◦ Many software having a complex front-end may need techniques used in compiler design.Many software having a complex front-end may need techniques used in compiler design.
� A symbolic equation solver which takes an equation as input. That program should
parse the given input equation.
◦ Most of the techniques used in compiler design can be used in Natural Language
Processing (NLP) systems.
34
� There are two major parts of a compiler: Analysis and Synthesis
� In analysis phase, an intermediate representation is created from the
given source program.
◦ Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of this phase.◦ Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of this phase.
� In synthesis phase, the equivalent target program is created from this
intermediate representation.
◦ Intermediate Code Generator, Code Generator, and Code Optimizer are the parts of this phase.
35
Lexical Analyzer
Semantic Analyzer
Syntax Analyzer
IntermediateCode Generator
Code Optimizer
CodeGenerator
TargetProgram
SourceProgram
36
• Each phase transforms the source program from one representation
into another representation.
• They communicate with error handlers.
• They communicate with the symbol table.
Lexical analyzer
Syntax analyzer
Semantic analyzer
E
R
R
O
R
S
Y
M
B
O
L
Semantic analyzer
Intermediate code generator
Code optimizer
Code generator
R
H
N
D
L
E
R
T
B
L
M
N
G
R
� Lexical Analyzer reads the source program character by character and
returns the tokens of the source program.
� A token describes a pattern of characters having same meaning in the
source program. (such as identifiers, operators, keywords, numbers,
delimeters and so on)
Ex: newval := oldval + 12 => tokens: newval identifierEx: newval := oldval + 12 => tokens:
:= assignment operator
oldval identifier
+ add operator
12 a number
� Puts information about identifiers into the symbol table.
� Regular expressions are used to describe tokens (lexical constructs).
� A (Deterministic) Finite State Automaton can be used in the
implementation of a lexical analyzer.
38
� A Syntax Analyzer creates the syntactic structure (generally a parse tree)
of the given program.
� A syntax analyzer is also called as a parser.
� A parse tree describes a syntactic structure.
assgstmt
identifier := expression
newval expression + expression
identifier number
oldval 12
39
• In a parse tree, all terminals are at leaves.
• All inner nodes are non-terminals in a context free grammar.
� The syntax of a language is specified by a context free grammar
(CFG).
� The rules in a CFG are mostly recursive.
� A syntax analyzer checks whether a given program satisfies the rules
implied by a CFG or not.implied by a CFG or not.
◦ If it satisfies, the syntax analyzer creates a parse tree for the given program.
� Ex: We use BNF (Backus Naur Form) to specify a CFGassgstmt -> identifier := expression
expression -> identifier
expression -> number
expression -> expression + expression
40
� Which constructs of a program should be recognized by the lexical analyzer,
and which ones by the syntax analyzer?
◦ Both of them do similar things; But the lexical analyzer deals with simple
non-recursive constructs of the language.
◦ The syntax analyzer deals with recursive constructs of the language.
◦
The syntax analyzer deals with recursive constructs of the language.
◦ The lexical analyzer simplifies the job of the syntax analyzer.
◦ The lexical analyzer recognizes the smallest meaningful units (tokens) in a
source program.
◦ The syntax analyzer works on the smallest meaningful units (tokens) in a
source program to recognize meaningful structures in our programming
language.
41
� Depending on how the parse tree is created, there are different parsing techniques.
� These parsing techniques are categorized into two groups:
◦ Top-Down Parsing,
◦ Bottom-Up Parsing
� Top-Down Parsing:
◦ Construction of the parse tree starts at the root, and proceeds towards the leaves.
◦ Efficient top-down parsers can be easily constructed by hand.
◦ Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).◦ Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).
� Bottom-Up Parsing:
◦ Construction of the parse tree starts at the leaves, and proceeds towards the root.
◦ Normally efficient bottom-up parsers are created with the help of some software tools.
◦ Bottom-up parsing is also known as shift-reduce parsing.
◦ Operator-Precedence Parsing – simple, restrictive, easy to implement
◦ LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR
42
� A semantic analyzer checks the source program for semantic errors and
collects the type information for the code generation.
� Type-checking is an important part of semantic analyzer.
� Normally semantic information cannot be represented by a context-free
language used in syntax analyzers.
� Context-free grammars used in the syntax analysis are integrated with � Context-free grammars used in the syntax analysis are integrated with
attributes (semantic rules)
◦ the result is a syntax-directed translation,
◦ Attribute grammars
� Ex:
newval := oldval + 12
� The type of the identifier newval must match with type of the expression (oldval+12)
43
� A compiler may produce an explicit intermediate codes representing the
source program.
� These intermediate codes are generally machine (architecture
independent). But the level of intermediate codes is close to the level of
machine codes.
� Ex:� Ex:newval := oldval * fact + 1
id1 := id2 * id3 + 1
MULT id2,id3,temp1 Intermediates Codes (Quadraples)ADD temp1,#1,temp2
MOV temp2,,id1
44
� The code optimizer optimizes the code produced by the intermediate
code generator in the terms of time and space.
� Ex:
MULT id2,id3,temp1
ADD temp1,#1,id1
CS416 Compiler Design 45
� Produces the target language in a specific architecture.
� The target program is normally is a reloadable object file containing the
machine codes.
� Ex:
( assume that we have an architecture with instructions whose at least one of
its operands is
a machine register)
MOVE id2,R1
MULT id3,R1
ADD #1,R1
MOVE R1,id1
CS416 Compiler Design 46
� A compiler is a computer program that
translates a program in a source language
into an equivalent program in a target
language.
� A source program/code is a program/code
written in the source language, which is
usually a high-level language.
compiler
Source program
Target programusually a high-level language.
� A target program/code is a program/code
written in the target language, which often
is a machine language or an intermediate
code. Error message
scanner
parser
Semantic analyzer
Stream of charactersStream of charactersStream of charactersStream of characters
Stream of tokensStream of tokensStream of tokensStream of tokens
Parse/syntax treeParse/syntax treeParse/syntax treeParse/syntax tree
Annotated treeAnnotated treeAnnotated treeAnnotated tree
Intermediate code generator
Code optimization
Code generator
Code optimization
Intermediate codeIntermediate codeIntermediate codeIntermediate code
Intermediate codeIntermediate codeIntermediate codeIntermediate code
Target codeTarget codeTarget codeTarget code
Target codeTarget codeTarget codeTarget code
Fig. Operation of a typical multi-language, multi-target compiler.
• Lexical analysis involves scanning of the source programme from
left to right and
• Separating them into tokens.
• A token is a sequence of characters having a collective meaning.
• Tokens are seperated by blanks,operators and special symbols.
• A lexical analysis on the statement. • A lexical analysis on the statement.
• X=Y+Z*30;
• Will generate the following tokens
X = ;30*Y + Z
X is an identifier.
= is a terminal symbol.
Y is an identifier.
+ is a terminal symbol.
Z is an identifier.
* Is a terminal symbol.
30 is literal.30 is literal.
;is a terminal symbol.
• Blank s seperating the token are eliminated.
• Lexical phase discards comments since they have no effect on the processing Of the program.
• Identifiers are stored in the symbol table.
• Literals are stored in the literal table.
• Syntax analysis deals with the syntax error detection and correction.
• In the program if the spelling is wrong,then in syntax analysis the spelling is
Corrected according to the syntax table.
• This syntax table is the table in which for a wrong keyword in code,the
different Corrections are available.
• Example:- for iffff & thennn in the code, thee syntax table will be- then
than
if
If then
� It gets the parse tree from the parser together with information
about some syntactic elements
� It determines if the semantics or meaning of the program is
correct.
� This part deals with static semantic.
◦
This part deals with static semantic.
◦ semantic of programs that can be checked by reading off from
the program only.
◦ syntax of the language which cannot be described in context-
free grammar.
� Mostly, a semantic analyzer does type checking.
� It modifies the parse tree in order to get that (static) semantically
correct code.
� An intermediate code generator
◦ takes a parse tree from the semantic analyzer
◦ generates a program in the intermediate language.
� In some compilers, a source program is translated into an
intermediate code first and then the intermediate code is translated intermediate code first and then the intermediate code is translated
into the target language.
� In other compilers, a source program is translated directly into the
target language.
54
� Using intermediate code is beneficial when compilers which
translates a single source language to many target languages are
required.
◦ The front-end of a compiler – scanner to intermediate code generator – can be ◦ The front-end of a compiler – scanner to intermediate code generator – can be
used for every compilers.
◦ Different back-ends – code optimizer and code generator– is required for each
target language.
� One of the popular intermediate code is three-address code. A
three-address code instruction is in the form of x = y op z.
� Replacing an inefficient sequence of instructions with a better
sequence of instructions.
� Sometimes called code improvement.
� Code optimization can be done:
◦ after semantic analyzing
� performed on a parse tree� performed on a parse tree
◦ After intermediate code generation
� performed on a intermediate code
◦ after code generation
� performed on a target code
� A code generator
◦ takes either an intermediate code or a parse tree
◦ produces a target program.
� Static memory allocation refers to the process of allocating memory
at compile-time before the associated program is executed.
� Dynamic memory allocation is automatic memory allocation
where memory is allocated as required at run-time.
• An application of this technique involves a program module (e.g. function or
subroutine) declaring static data locally, such that these data are inaccessible in
other modules unless references to it are passed as parameters or returned.
• A single copy of static data is retained and accessible through many calls to the
function in which it is declared.
• Static memory allocation therefore has the advantage of modularising data within a
program design in the situation where these data must be retained through the
runtime of the program.runtime of the program.
• The use of static variables within a class in object oriented programming enables a
single copy of such data to be shared between all the objects of that class.
• Object constants known at compile-time, like string literals, are usually allocated
statically. In object-oriented programming, the virtual method tables of classes are
usually allocated statically.
• A statically defined value can also be global in its scope ensuring the same
immutable value is used throughout a run for consistency.
Code A
Data A
Code B Code A
Code A
Code B
Data B
Code C
Code B
Code C
Data A
Code C
Data A
Data BData C
a)Staticallocation
b) Dynamic allocationThe main program A Is active.
c)Dynamic allocationThe main program A calls The function B
Advantages of static memory allocation:-
• It is simple to implement.
• Binding is performed during compilation.
• Execution is faster as there is no binding during the run time and addresses of
• Variables is directly encoded in machine instructions.
Disadvantages of static memory allocation:-
• It is almost impossible to handle recursion.
• Memory requirement is higher.Variables remain allocated even when the
Function is not active.
Advantages of Dynamic memory allocation:-
In dynamic memory allocation , memory is allocated and deallocated during execution of a program.
• Dynamic memory allocation has optimal utilization of memory.
• Recursion can be implemented using dynamic memory allocation.• Recursion can be implemented using dynamic memory allocation.
Compilation Of Control Structure:-
Some of the control structures are:
There are language features which govern the sequencing of control
through a Program.
Some of the control structures are:
1)If statement.
2)While loop statement.
If Statement:-We can translate an If statement into three address code. A program containing
If statement can be mapped into an equivalent program containing explicit goto’s.
A sample if statement is compiled into three-address code in following fig.
If (E) 100:if(E) then goto 102
{ 101 : goto 104
s1 ; 102 : s1 ;
} => 103 : goto 105
Else 104 : s2 ;
{ 105 : s3 ; s2;
}S3;
It should be clear that if E is true then the two statements s1 and s3 will be executed.
If E is false then the two statements s2 and s3 will be executed.
While statement:-
•We can translate while-statement into three address code. A program containing while •Statement can be mapped into an equivalent program containing explicit goto’s.
•A sample while statement is compiled into three address codes.
While (E) 100 : if (E) then goto 102
{ 101 : goto 105
s1 ; 102: s1
S2; => 103 : s2
} 104 : goto 100
S3; 105 : s3
• It should be clear that if E is true then control enters the body of the loop andexecutes S1 and s2.subsequently,it goes back to the beginning of the loop.executes S1 and s2.subsequently,it goes back to the beginning of the loop.
• If E is false then control is transferred to the statement s3.
� An interpreter accepts the source program and performs the actions
associated with the instructions
� It creates virtual execution environment
� An interpreter reads source code one instruction or a line at a time,
converts this into machine code or some intermediate form and converts this into machine code or some intermediate form and
executes it.
COMPILERCOMPILERCOMPILERCOMPILER INTERPRETERINTERPRETERINTERPRETERINTERPRETER
1. Compiler translates the entire program into machine code
2. If there is a problem in the code, compiled programs make the programmer wait for the execution of entire program.
1. Interpreter translates code one line at a time executing each line as it is translated.
2. An interpreter lets programmer know immediately when and where problems exist in program.
3. Compilers produce better optimized codes that generally runs fast.
4. It spends lot of time analyzing and generating machine code. It is not suited during program development.
where problems exist in the code.
3. Program execution is relatively slow due to interpreter overhead.
4. Relatively little time is spent in analyzing as it executes line by line.