unit 2

Macro processor, Compliers and Interpreters

� Macro definition and call, macro expansion, Machine Independent

macro processor features,

� Nested macro calls, advanced macro facilities, Design of macro

preprocessor.preprocessor.

� Basic compliers function, Phases of compilation, memory allocation,

compilation of expression,

� compilation of expressions, compilation of control structures, code of

optimization, interpreter.

� Macro are used to provide a program generation facility through

macro expansion.

� Definition :- A macro is a unit of specification for program

generation through expansion.

� A macro consists of a name, set of formal parameter and body of � A macro consists of a name, set of formal parameter and body of

code.

� The use of a macro name with a set of actual parameters is replaced

by some code generated from its body. This is called macro

expansion

� Examples of Macro

� If we have to Re-writing a program then with the help of macros we can

write((.

Start of definitionStart of definitionStart of definitionStart of definition MacroMacroMacroMacro

Macro name My macro

Macro body ADD AREG,XADD BREG, X

End of the macro definition

MEND

write((.

� Original program program with macro

ADD AREG,XADD BREG, X



My macro

My macro

My macro

� A macro processor takes a source with macro definition and macro

calls and replace each with its body.

Target codeTarget codeTarget codeTarget code


Source programSource programSource programSource program

MACROMymacro

ADD AREG,X



Macro Macro Macro Macro ProcessorProcessorProcessorProcessor


MEND

mymacro

mymacro

mymacro

Input

output

� Two kinds of expansion

1. Lexical Expansion

2. Semantic Expansion

Lexical Expansion :- Lexical Expansion implies replacement of

character string by another character string during program character string by another character string during program

generation .Lexical expansion employed to replace occurrences of

formal parameters by corresponding actual parameters.

� Semantic expansion : Semantic expansion implies generation of

instructions tailored to the requirements of specific usage. For

Examples Generation of type specific instructions for manipulation

of byte and words operands.

� A macro is a set of tasks combined together so that you can run or

replay the entire task together with a single command. Macros are a

powerful productivity tool.

� With macros you can perform long or boring tasks just by a single

click.click.

�Macro allows a sequence of source language code to be defined once

and then referred to by name each time it is to be referred..

� A macro definition is enclosed between a macro header statement

and macro end statement. Macro definition are typically located at

the start of a program. A macro definition consists of

� A macro prototype statement

� One or more model statements� One or more model statements

� Macro preprocessor statements

� A macro prototype statement:- The Macro protoype statement

declares the name of macro and kinds of its parameters.

� One or more model statements: A model statement is a statement

from which an assembly language statement may be generated

during macro expansion. during macro expansion.

� Macro preprocessor statements: A preprocessor statement is used to

perform auxiliary functions during macro expansion.

� Macros are typically defined at a start of a program. A macro

definition

consists of

(1) MACRO pseudo opcode.

(2) MACRO name.

(3) Sequence of statements to be abbreviated.

(4) MEND pseudo opcode terminating macro definition.

The structure of a macro is:The structure of a macro is:

%MACRO macro_name;

<macro_text>

%MEND <macro_name>;

Example:

MACRO

INCR &ARG

ADD AREG ,& ARG

ADD BREG ,& ARG

ADD CREG ,& ARG

MEND

� A macro is called by writing macro name with actual parameters in

an assembly program.

The macro call has following syntax:

< macro name > [ < list of parameters > ]

For example,For example,

INCR X

Will call the macro INCR with the actual parameter X.

� A macro call leads to macro expansion .

� During macro expansion, the macro call statement is replaced by a

sequence of assembly statements.

� To differentiate between the original statements of program and the To differentiate between the original statements of program and the

statement resulting from macro expansion ,each expanded statement

is marked with a ‘+’ preceding its label filed.

� Two key notions concerning macro expansion are:

� 1.Expansion time control flow

� 2.Lexical substitution

� 1.Expansion time control flow:- This determine the order in which model

statement are visited during macro expansion

� The flow of control during macro expansion is implemented using a macro

expansion counter(MEC).

� Algorithm: (outline of macro expansion)

1. MEC: statement number of first statement following the prototype statement

2. While statement pointed by MEC is not a MEND statement 2. While statement pointed by MEC is not a MEND statement

(a) If a model statement then

(i) Expand the statement

(ii) MEC:= MEC+1;

(b) Else (i.e a preprocessors statement )

(i) MEC:= new value specified in the statement ;

3. Exit from macro expansion .

� 2.Lexical substitution :- Lexical substitution is used to generate an

assembly statement from a model statement.

� A model statement consists of 3 types of string

(1) An ordinary string, which stands for itself

(2) The name of Formal parameter which is preceded (2) The name of Formal parameter which is preceded

by the character ‘&’.

(3) The name of a preprocessor variable , which is

also preceded by the character ‘&’

�

� A macro may be defined with:

(1) Positional parameters.

(2) Keyword parameters.

Positional Parameters: A positional parameter is written as & parameter_name.

For example , INCR &VARIABLE.

Keyword Parameters: During call to a micro , a keyword parameter is specified Keyword Parameters: During call to a micro , a keyword parameter is specified

by its name.

It takes following form:

< parameter name > = < parameter value >

DEFINATION:-

A model statement in a macro may constitute a call on

another macro i.e. macro call within a macro. Such calls are known

as “Nested Macro Calls”.

� Macro containing the nested call is known as “Outer Macro". A

called macro is known as “Inner macro”.

� Expansion of nested macro calls follows “last-in- first-out(LIFO)

rules". Thus inner structure of nested macro calls ,the expansion of

latest macro call is completed first.latest macro call is completed first.

� `

� Example:-Consider program segment :-

� MACRO

� COMPUTE &ARG

� MOVER AREG ,&ARG

� ADD AREG ,=‘1’

� MOVEN AREG,&ARG

� MEND

� MACRO

� COMPUTE &ARG1,ARG2,ARG3� COMPUTE &ARG1,ARG2,ARG3

� COMPUTE &ARG1

� COMPUTE &ARG2

� COMPUTE &ARG3

� MEND

� The definition of macros “COMPUTE1” contains three separate calls to a define macro

“ COMPUTE”.

� Such macros are expanded on multiple levels.

� Expansion of “COMPUTE1 x,y,z” as follows:-

Source line expanded source expanded source (level 2)

(level 1)

� COMPUTE1 X,Y,Z COMPUTE X MOVER AREG,X

� COMPUTE Y ADD AREG,=‘1’

� COMPUTE Z MOVEM AREG ,X

�

� MOVER AREG,Y� MOVER AREG,Y

� ADD AREG,=‘1’

� MOVEM AREG ,Y

�

� MOVER AREG,Z

� ADD AREG,=‘1’

� MOVEM AREG ,Z

� Advanced macro facilities are aimed at supporting semantic expansion. These facilities can

be grouped into :

1) Facilities for alteration of flow of control during expansion.

2) Expansion time variables.

3) Attributes of parameters.

Two features are provided to facilitate alteration of flow of control

during expansion:-

1)Expansion time sequencing symbols.

2)Expansion time statements AIF,AGO and ANOP.

� AIF is similar to IF statement,the label used for branching is known � AIF is similar to IF statement,the label used for branching is known

as a sequencing symbol.

� AGO is similar to GOTO statement.

Sequencing symbol has the syntax:

<ordinary string>

� An AIF statement has the syntax:

� AIF(<expression>)<sequencing symbol>

� where <expression> is relational expression involving ordinary

string formal parameters etc.

� AIF is conditional branch Pseudo-opcode.AIF is conditional branch Pseudo-opcode.

� An AGO statement has the syntax:

� AGO<sequencing symbol>

� AGO is unconditional branch Pseudo-opcode

� where transfer of expansion time control to the statement containing

<sequencing symbol> in its label field takes place unconditionally.

�

� An ANOP statement has the syntax

<sequencing symbol> ANOP

It simply has the effect of defining the sequencing symbol.

� Expansion of time variables(EV) are variables which can only be use

during the expansion of macro calls.

� A local EV is created for use only during the a particular macro call .

� Its syntax is:-

LCL<&variable name>[,< variable name>(.]

� A global EV exist across all macro calls situated in a program & can be � A global EV exist across all macro calls situated in a program & can be

used in any macro which has a declaration for it.

� Its syntax is:

GBL<&variable name>[,< variable name>(.]

� Expansion of time variables(EV) can be manipulated through the statement

SET .

Syntax:-

< Expansion time variables>SET< expression>

� Example:-

� MACRO

� CONSTANTS

� LCL &A

� &A SET 1

� DB &A

� &A SET &A+1&A SET &A+1

� DB &A

� MEND

� A call on macro CONSTANTS is explained as follows:

� The local EV A is created.The first SET statement assigns the value ‘1’ to it .The first DB

statement thus declares a byte constant ‘1’. The second SET statement assigns the value

� ‘2’ to A and the second DB statement declares a constant ‘2’.

� An attribute is written using the syntax:-

� <attribute name>’<formal parameter spec>

and represents information about the value of the formal parameter i.e.

about the corresponding actual parameter. The type,length and size

attributes have the names T,L and S.attributes have the names T,L and S.

� Example:-

� MACRO

� DCL_CONST &A

� AIF (L’ &A EQ 1) .NEXT

� _ _

� .NEXT _ _� .NEXT _ _

� _ _

� MEND

� Here expansion time control is transferred to the statement having.NEXT

in its label field only if the actual parameter corresponding to the formal

parameter corresponding to the formal parameters A has the length of ‘1’.

� The macro preprocessor accepts an assembly program containing

definitions and calls and translate it into an assembly program which

does not contain any macro definitions or calls.

� The program from output by macro preprocessor can now be handed

over to an assembler to obtain the target language from of the over to an assembler to obtain the target language from of the

program.

� Thus the macro preprocessor segregates macro expansion from the

process of program assembly.

� It is economical because

� We begin the design by listing all tasks involved in macro

expansion.

1. Identify macro calls in the program.

2. Determine the values of formal parameters.

3. Maintain the values of expansion time variables declared in macros.3. Maintain the values of expansion time variables declared in macros.

4. Organize expansion time control flow.

5. Determine the values of sequencing symbols.

6. Perform expansion of model statement.

1. Identify the information necessary to perform a task.

2. Design a suitable data structure to record the information.

3. Determine the processing necessary to obtain the information.

4. Determine the processing necessary to perform the task. 4. Determine the processing necessary to perform the task.

� A compiler is a program takes a program written in a source language and

translates it into an equivalent program in a target language.

Source program COMPILER target program

error messages

33

( Normally a program written in

a high-level programming language)

( Normally the equivalent program in

machine code – relocatable object file)

� In addition to the development of a compiler, the techniques used in compiler design can be

applicable to many problems in computer science.

◦ Techniques used in a lexical analyzer can be used in text editors, information retrieval

system, and pattern recognition programs.

◦ Techniques used in a parser can be used in a query processing system such as SQL.

◦ Many software having a complex front-end may need techniques used in compiler design.Many software having a complex front-end may need techniques used in compiler design.

� A symbolic equation solver which takes an equation as input. That program should

parse the given input equation.

◦ Most of the techniques used in compiler design can be used in Natural Language

Processing (NLP) systems.

34

� There are two major parts of a compiler: Analysis and Synthesis

� In analysis phase, an intermediate representation is created from the

given source program.

◦ Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of this phase.◦ Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of this phase.

� In synthesis phase, the equivalent target program is created from this

intermediate representation.

◦ Intermediate Code Generator, Code Generator, and Code Optimizer are the parts of this phase.

35

Lexical Analyzer

Semantic Analyzer

Syntax Analyzer

IntermediateCode Generator

Code Optimizer

CodeGenerator

TargetProgram

SourceProgram

36

• Each phase transforms the source program from one representation

into another representation.

• They communicate with error handlers.

• They communicate with the symbol table.

Lexical analyzer

Syntax analyzer

Semantic analyzer

E

R

R

O

R

S

Y

M

B

O

L

Semantic analyzer

Intermediate code generator

Code optimizer

Code generator

R

H

N

D

L

E

R

T

B

L

M

N

G

R

� Lexical Analyzer reads the source program character by character and

returns the tokens of the source program.

� A token describes a pattern of characters having same meaning in the

source program. (such as identifiers, operators, keywords, numbers,

delimeters and so on)

Ex: newval := oldval + 12 => tokens: newval identifierEx: newval := oldval + 12 => tokens:

:= assignment operator

oldval identifier

+ add operator

12 a number

� Puts information about identifiers into the symbol table.

� Regular expressions are used to describe tokens (lexical constructs).

� A (Deterministic) Finite State Automaton can be used in the

implementation of a lexical analyzer.

38

� A Syntax Analyzer creates the syntactic structure (generally a parse tree)

of the given program.

� A syntax analyzer is also called as a parser.

� A parse tree describes a syntactic structure.

assgstmt

identifier := expression

newval expression + expression

identifier number

oldval 12

39

• In a parse tree, all terminals are at leaves.

• All inner nodes are non-terminals in a context free grammar.

� The syntax of a language is specified by a context free grammar

(CFG).

� The rules in a CFG are mostly recursive.

� A syntax analyzer checks whether a given program satisfies the rules

implied by a CFG or not.implied by a CFG or not.

◦ If it satisfies, the syntax analyzer creates a parse tree for the given program.

� Ex: We use BNF (Backus Naur Form) to specify a CFGassgstmt -> identifier := expression

expression -> identifier

expression -> number

expression -> expression + expression

40

� Which constructs of a program should be recognized by the lexical analyzer,

and which ones by the syntax analyzer?

◦ Both of them do similar things; But the lexical analyzer deals with simple

non-recursive constructs of the language.

◦ The syntax analyzer deals with recursive constructs of the language.

◦

The syntax analyzer deals with recursive constructs of the language.

◦ The lexical analyzer simplifies the job of the syntax analyzer.

◦ The lexical analyzer recognizes the smallest meaningful units (tokens) in a

source program.

◦ The syntax analyzer works on the smallest meaningful units (tokens) in a

source program to recognize meaningful structures in our programming

language.

41

� Depending on how the parse tree is created, there are different parsing techniques.

� These parsing techniques are categorized into two groups:

◦ Top-Down Parsing,

◦ Bottom-Up Parsing

� Top-Down Parsing:

◦ Construction of the parse tree starts at the root, and proceeds towards the leaves.

◦ Efficient top-down parsers can be easily constructed by hand.

◦ Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).◦ Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).

� Bottom-Up Parsing:

◦ Construction of the parse tree starts at the leaves, and proceeds towards the root.

◦ Normally efficient bottom-up parsers are created with the help of some software tools.

◦ Bottom-up parsing is also known as shift-reduce parsing.

◦ Operator-Precedence Parsing – simple, restrictive, easy to implement

◦ LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR

42

� A semantic analyzer checks the source program for semantic errors and

collects the type information for the code generation.

� Type-checking is an important part of semantic analyzer.

� Normally semantic information cannot be represented by a context-free

language used in syntax analyzers.

� Context-free grammars used in the syntax analysis are integrated with � Context-free grammars used in the syntax analysis are integrated with

attributes (semantic rules)

◦ the result is a syntax-directed translation,

◦ Attribute grammars

� Ex:

newval := oldval + 12

� The type of the identifier newval must match with type of the expression (oldval+12)

43

� A compiler may produce an explicit intermediate codes representing the

source program.

� These intermediate codes are generally machine (architecture

independent). But the level of intermediate codes is close to the level of

machine codes.

� Ex:� Ex:newval := oldval * fact + 1

id1 := id2 * id3 + 1

MULT id2,id3,temp1 Intermediates Codes (Quadraples)ADD temp1,#1,temp2

MOV temp2,,id1

44

� The code optimizer optimizes the code produced by the intermediate

code generator in the terms of time and space.

� Ex:

MULT id2,id3,temp1

ADD temp1,#1,id1

CS416 Compiler Design 45

� Produces the target language in a specific architecture.

� The target program is normally is a reloadable object file containing the

machine codes.

� Ex:

( assume that we have an architecture with instructions whose at least one of

its operands is

a machine register)

MOVE id2,R1

MULT id3,R1

ADD #1,R1

MOVE R1,id1

CS416 Compiler Design 46

� A compiler is a computer program that

translates a program in a source language

into an equivalent program in a target

language.

� A source program/code is a program/code

written in the source language, which is

usually a high-level language.

compiler

Source program

Target programusually a high-level language.

� A target program/code is a program/code

written in the target language, which often

is a machine language or an intermediate

code. Error message

scanner

parser

Semantic analyzer

Stream of charactersStream of charactersStream of charactersStream of characters

Stream of tokensStream of tokensStream of tokensStream of tokens

Parse/syntax treeParse/syntax treeParse/syntax treeParse/syntax tree

Annotated treeAnnotated treeAnnotated treeAnnotated tree

Intermediate code generator

Code optimization

Code generator

Code optimization

Intermediate codeIntermediate codeIntermediate codeIntermediate code

Intermediate codeIntermediate codeIntermediate codeIntermediate code



Fig. Operation of a typical multi-language, multi-target compiler.

• Lexical analysis involves scanning of the source programme from

left to right and

• Separating them into tokens.

• A token is a sequence of characters having a collective meaning.

• Tokens are seperated by blanks,operators and special symbols.

• A lexical analysis on the statement. • A lexical analysis on the statement.

• X=Y+Z*30;

• Will generate the following tokens

X = ;30*Y + Z

X is an identifier.

= is a terminal symbol.

Y is an identifier.

+ is a terminal symbol.

Z is an identifier.

* Is a terminal symbol.

30 is literal.30 is literal.

;is a terminal symbol.

• Blank s seperating the token are eliminated.

• Lexical phase discards comments since they have no effect on the processing Of the program.

• Identifiers are stored in the symbol table.

• Literals are stored in the literal table.

• Syntax analysis deals with the syntax error detection and correction.

• In the program if the spelling is wrong,then in syntax analysis the spelling is

Corrected according to the syntax table.

• This syntax table is the table in which for a wrong keyword in code,the

different Corrections are available.

• Example:- for iffff & thennn in the code, thee syntax table will be- then

than

if

If then

� It gets the parse tree from the parser together with information

about some syntactic elements

� It determines if the semantics or meaning of the program is

correct.

� This part deals with static semantic.

◦

This part deals with static semantic.

◦ semantic of programs that can be checked by reading off from

the program only.

◦ syntax of the language which cannot be described in context-

free grammar.

� Mostly, a semantic analyzer does type checking.

� It modifies the parse tree in order to get that (static) semantically

correct code.

� An intermediate code generator

◦ takes a parse tree from the semantic analyzer

◦ generates a program in the intermediate language.

� In some compilers, a source program is translated into an

intermediate code first and then the intermediate code is translated intermediate code first and then the intermediate code is translated

into the target language.

� In other compilers, a source program is translated directly into the

target language.

54

� Using intermediate code is beneficial when compilers which

translates a single source language to many target languages are

required.

◦ The front-end of a compiler – scanner to intermediate code generator – can be ◦ The front-end of a compiler – scanner to intermediate code generator – can be

used for every compilers.

◦ Different back-ends – code optimizer and code generator– is required for each

target language.

� One of the popular intermediate code is three-address code. A

three-address code instruction is in the form of x = y op z.

� Replacing an inefficient sequence of instructions with a better

sequence of instructions.

� Sometimes called code improvement.

� Code optimization can be done:

◦ after semantic analyzing

� performed on a parse tree� performed on a parse tree

◦ After intermediate code generation

� performed on a intermediate code

◦ after code generation

� performed on a target code

� A code generator

◦ takes either an intermediate code or a parse tree

◦ produces a target program.

� Static memory allocation refers to the process of allocating memory

at compile-time before the associated program is executed.

� Dynamic memory allocation is automatic memory allocation

where memory is allocated as required at run-time.

• An application of this technique involves a program module (e.g. function or

subroutine) declaring static data locally, such that these data are inaccessible in

other modules unless references to it are passed as parameters or returned.

• A single copy of static data is retained and accessible through many calls to the

function in which it is declared.

• Static memory allocation therefore has the advantage of modularising data within a

program design in the situation where these data must be retained through the

runtime of the program.runtime of the program.

• The use of static variables within a class in object oriented programming enables a

single copy of such data to be shared between all the objects of that class.

• Object constants known at compile-time, like string literals, are usually allocated

statically. In object-oriented programming, the virtual method tables of classes are

usually allocated statically.

• A statically defined value can also be global in its scope ensuring the same

immutable value is used throughout a run for consistency.

Code A

Data A

Code B Code A

Code A

Code B

Data B

Code C

Code B

Code C

Data A

Code C

Data A

Data BData C

a)Staticallocation

b) Dynamic allocationThe main program A Is active.

c)Dynamic allocationThe main program A calls The function B

Advantages of static memory allocation:-

• It is simple to implement.

• Binding is performed during compilation.

• Execution is faster as there is no binding during the run time and addresses of

• Variables is directly encoded in machine instructions.

Disadvantages of static memory allocation:-

• It is almost impossible to handle recursion.

• Memory requirement is higher.Variables remain allocated even when the

Function is not active.

Advantages of Dynamic memory allocation:-

In dynamic memory allocation , memory is allocated and deallocated during execution of a program.

• Dynamic memory allocation has optimal utilization of memory.

• Recursion can be implemented using dynamic memory allocation.• Recursion can be implemented using dynamic memory allocation.

Compilation Of Control Structure:-

Some of the control structures are:

There are language features which govern the sequencing of control

through a Program.

Some of the control structures are:

1)If statement.

2)While loop statement.

If Statement:-We can translate an If statement into three address code. A program containing

If statement can be mapped into an equivalent program containing explicit goto’s.

A sample if statement is compiled into three-address code in following fig.

If (E) 100:if(E) then goto 102

{ 101 : goto 104

s1 ; 102 : s1 ;

} => 103 : goto 105

Else 104 : s2 ;

{ 105 : s3 ; s2;

}S3;

It should be clear that if E is true then the two statements s1 and s3 will be executed.

If E is false then the two statements s2 and s3 will be executed.

While statement:-

•We can translate while-statement into three address code. A program containing while •Statement can be mapped into an equivalent program containing explicit goto’s.

•A sample while statement is compiled into three address codes.

While (E) 100 : if (E) then goto 102

{ 101 : goto 105

s1 ; 102: s1

S2; => 103 : s2

} 104 : goto 100

S3; 105 : s3

• It should be clear that if E is true then control enters the body of the loop andexecutes S1 and s2.subsequently,it goes back to the beginning of the loop.executes S1 and s2.subsequently,it goes back to the beginning of the loop.

• If E is false then control is transferred to the statement s3.

� An interpreter accepts the source program and performs the actions

associated with the instructions

� It creates virtual execution environment

� An interpreter reads source code one instruction or a line at a time,

converts this into machine code or some intermediate form and converts this into machine code or some intermediate form and

executes it.

COMPILERCOMPILERCOMPILERCOMPILER INTERPRETERINTERPRETERINTERPRETERINTERPRETER

1. Compiler translates the entire program into machine code

2. If there is a problem in the code, compiled programs make the programmer wait for the execution of entire program.

1. Interpreter translates code one line at a time executing each line as it is translated.

2. An interpreter lets programmer know immediately when and where problems exist in program.

3. Compilers produce better optimized codes that generally runs fast.

4. It spends lot of time analyzing and generating machine code. It is not suited during program development.

where problems exist in the code.

3. Program execution is relatively slow due to interpreter overhead.

4. Relatively little time is spent in analyzing as it executes line by line.

unit 2

Education